Strongly-typed Data for JavaScript (and beyond)

I get it. You’re busy. You just want to get the new customer address form working before the weekend.

Here’s the JSON data that you are using for testing the new form:

Your frontend is written in React and JavaScript, though Susan has recently been migrating some of the code to Typescript. Your backend is written in Java, and data is stored in MongoDB.

Data Models

Let’s take a step back for a moment and talk about the Data Models in your application.

When you write code you are either implicitly or explicitly referring to some set of concepts, and how they are organised. In this case the JSON snippet above tells us that we have what looks like a fairly simple address concept and it seems to be linked to a customer concept.

Note: this has nothing to do with whether you are using a functional or an object-oriented language, compiled statically or dynamically, or whether your language is strongly or weakly typed. As soon as we write code, we are building up a set of concepts in our programmer’s Mind Palace!

In this scenario we need consistent Data Models across at least three tiers of our application: React, Java middle-tier, and in the database. We are targeting two languages: Javascript and Java, with Typescript on the horizon.

To compound things further you know that Ragesh and his team are working on v2 of the application which will support localisation and international users, and they require a much richer concept of an Address. Somehow the data in the database is going to need to be migrated, or we have to maintain backwards compatibility when v2 is released.

Now, let’s walk through some options and discuss the technical and human implications for each…

Option 1: Tower of Babel

In this (defacto) scenario each tier in our application defines its own data model. Some do it implicitly (React form validation logic for example, which required that the state field is one of the US states or the special value Other, while the middle-tier in Java defines Java classes Address and Customer as part of a wider domain model used for business logic.

The Java code cannot trust that the front-end is validating data, so the logic to check the state field is recoded in Java, and JAXB is used to serialise instances to/from JSON.

Oh, and Susan just defined a new Typescript Address class:

What started as a simple web-form has now become a bit of a mess… Any change to the application’s concept of an Address requires coordinating with 3 or 4 teams and making changes in 3 different programming languages.

There has to be a better way!?

Option 2: JSON Schema

To help coordinate the teams working on the application, and ensure consistency in our definition of Address, we decide that we need to externalise the definition of Address from the application code itself.

This is a HUGE step forward. We now have a cross-platform data model definition, aka a schema! The people working on the frontend, middle-tier and backend now have a consistent definition of the data that needs to be edited, validated and persisted.

The JSON Schema document below is added to git and becomes the application’s definition of an address.

Quite quickly the sharp people on the various teams start to experiment with generating code from the JSON Schema document:

code generation – Generate java classes from a JSON schema – Stack Overflow

Statically Typed Data Validation with JSON Schema and TypeScript

As the application grows in complexity the JSON Schema is split across multiple files and types need to reference other types defined in different files, introducing people to JSON Pointers and $ref.

jsonschema – Importing all definitions from an external JSON Schema – Stack Overflow

Unfortunately managing this growing list of schema files, validating the references between the files, and just dealing with the JSON Schema syntax starts to become increasingly cumbersome. In addition each team is building their own code generation framework…

Maybe we can do better with something else?

Option 3: XML Schema

These requirements can’t be new, right? Enterprises must have solved this problem and they are used to dealing with huge domain models. XML has been around for decades and uses XML Schema, let’s give that a try…

First, let’s learn the XML Schema syntax and then write our first schema.

Not pretty, but thankfully there are some good graphical editors and tools to help us!

Oh, oh!

No one on the frontend or database team will engage with the schemas, let alone try to understand them! We’ve spent a lot of time modelling our concepts using a powerful schema language, but we’ve created a barrier to adoption due to the gap between the schema language and the runtime.

Even the mapping to our Java mid-tier is complex. No one is very happy (and we’ve now had to introduce a new “data models” team to manage the models) and reusing or referencing models is no easier. The data models team aren’t very happy either because XML Schema is big, complex and includes some document-oriented features like substitution groups and xs:any that don’t seem to fit well into the data validation use cases that we are targeting.

Option 4 : There must be something better!

At this point we cast the net far and wide and start to look at:

There’s lots of history and lots to like here, but they all feel quite coupled to a single runtime or use case (like RPC, query or binary serialisation). Many of them seem to be optimised for machines, not humans; defining, editing and using models across a multi-language code base.

Option 5 : Roll our own!

We’re smart. Let’s define our own “JSON Schema”. Something simpler which will meet our needs!

We roll up our sleeves and define our own JSON document to capture our models. Internal documentation is quite scarce and the tools are a little artisanal, but we end up with this:

Unfortunately we now have a developer (or two) that are working on code generation and the proprietary model definition format in JSON. The teams seem to have a never ending list of requirements (imports, sub-classing of types, enumerations, validation expressions) and it’s hard to keep up. Everytime we on-board a new developer we have to teach them about our schema language.

Managers and architects start to worry about the amount of time we are investing in this non-core activity, and what will happen if the technical lead for the schema language decides to move on…

Option 6: Concerto

And now we reach the end of our schema language odyssey: Accord Project Concerto!

Concerto addresses all of the concerns we’ve raised above, and all the examples in the article were generated automatically from this simple Concerto model:

Concerto is the schema language WE wanted when we walked down this path. It is small, just expressive enough to capture real business models, has a natural binding to most common runtimes, and makes creating reusable models quick and easy, and has code generators for Java, UML, Typescript, Go, JSON Schema, XML Schema, Loopback. It’s easy to create new code generators if necessary.

Here’s a rundown of some of its core features:

Import, reference or extend models hosted on the local file system or on URLs, meaning you can create truly modular schemas
Use ranges to validate numeric values, or regex to validate strings
Enumerations, concepts and sub-classing makes mapping from a business domain to a schema fast and intuitive
Add custom decorations to model elements and then access them using the API
A 100% JavaScript runtime so you can use it from Node.js or in a browser
A powerful API for introspecting models at runtime
Convert the models (statically, or on the fly) to a wide variety of output formats, including JSON Schema
Easy to read and develop, especially using the VSCode extension, which provides syntax highlighting and on-the-fly validation

You can even dynamically generate React web-forms from Concerto models!

Concerto generating a dynamic React form from a model

Concerto is Apache-2 licensed, stable, is implemented in Javascript, and can even be embedded in a web page. Concerto is managed by the friendly Accord Project community, under the Linux Foundation.

The Ergo domain specific language from Accord Project uses Concerto models as its type system.

There’s even an Open Source model repository to help get you started:

Accord Project Model Repository

Or you can, of course, host your own model repository, or just publish your models to any HTTP(S) URL.

We’d love you to take Concerto for a spin and let us know what you think. Once you are up and running we are always looking for new contributors to the Concerto Open Source project, to improve it, and to broaden its usage even further.