In yesterday’s post I talked in broad terms about the scope, purpose and intention of SchemaType without actually showing what the language looks like. In this post I’ll fix that!

Assume you have this person.yaml file:

name: Ingy döt Net
age: 17
dead: false

The simplest way to define a person.stp (.stp is the SchemaType file extension) file might look like this:

name: string
age: integer
dead: boolean

This is the starting point of SchemaType. Let’s talk about what we have so far. First off the overall input format for SchemaType is YAML. Note that since JSON is a proper subset of YAML, you could also write SchemaType files in JSON.

The definition of an object is simply a mapping of its keys to the type of its value. This is very simple (which is good), but it’s too simple to be a real language. Let’s move forward with a few additions:

-name: /person
-desc: Data about a person
-spec: schematype.org/v0.0.1
-from: github:schematype/type/#v0.1.6

name: +human/name
age: +int 1..100
dead?: +boolean

This is a valid and complete SchemaType definition. You could publish this file for reuse at http://yourdomain.net/person.stp.

The first thing you probably noticed are the unusual punctuation characters like -, +, ? and %. SchemaType is a YAML based DSL. The intention of the DSL is to keep the definitions concise. At the expense of learning just a few syntax concepts, you can keep simple things simple and massive things manageable.

Keys that begin with a - belong to the SchemaType language. The first 4 keys will be present in almost every SchemaType definition.

The -name field refers to the name of the type being defined in this file. It should match the base name of the file that it is stored in. A SchemaType file can define multiple types. In that case the name should be set to / and the file name should be index.stp. A condensed example of an index.stp file that defines the types type1 and type2 would look like:

-name: /
+type1: ...
+type2: ...

Interestingly the ... is actually SchemaType syntax. It denotes a type reference of +any, ie any type. More about that in a future post.

The -desc field is a short description of the purpose of this schema. The -spec field indicates the version of the SchemaType Language Specification being used.

The most important keyword here is from. Within a SchemaType definition, types are referenced by names preceded with a +. In reality, each +type1 reference expands to a full URL like https://github.com/schematype/type/blob/f54b0aa/type1. The -from is a set of type shorthands to their immutable definition URLs. Again, more about that in future posts.

Back to the meat of our SchemaType example from above:

name: +human/name
age: +int 1..100
dead?: +boolean

If you see a SchemaType definition that uses +string it should be a red flag that the schema author is DoingItWrong™. The +human/name type (defined by schematype/type) inherits from +string but it places much more strict constraints on valid values. For instance a URL or a GUID or the full text of the US Constitution are all valid strings but will not be valid names of people.

SchemaType is very big on type inheritance. SchemaType only defines 5 types in its spec: String, Number, Boolean, Null, and Any, but none of these are ever referenced directly in schemas.

Next consider the age field. This is actually creating a new temporary or anonymous type. It inherits from +int but then places a range constraint on it. Another way to do this would be:

name: +human/name
age: +age
dead?: +boolean

+age: +int 1..100

The difference here is that we named the subtype. It is therefore available publicly as person/age in our schema publication.

Last, but certainly not least is the ? n dead?. It simply means that the dead key/value pair is optional. This implies the name and age are required. And that’s true. All keys in a definition are required by default. This stands in contrast to JSON Schema where all pairs are optional by default and unspecified keys are allowed. To allow any extra key/value pair in SchemaType you need to declare it like this:

name: +human/name
age: +int 1..100
dead?: +boolean
...: ...

There’s that ... again. This means that any key is allowed to map to any value type.

That wraps up the explanation of a very basic SchemaType document. For comparison, let’s look at the JSON Schema equivalent of our example:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "title": "Person",
    "type": "object",
    "properties": {
        "name": {
            "$ref": "https://example.com/json.schema#human/name"
        },
        "age": {
            "type": "integer",
            "minimum": 1,
            "maximum": 100
        },
        "dead": {
            "type": "boolean"
        }
    },
    "additionalProperties": false,
    "required": [
      "name",
      "age"
    ]
}

This is a rough equivalent. This example is not so bad because you can see the whole thing on one page. In reality, JSON Schema gets unwieldy fast. A primary reason is that things like required are not made known in the place where the key is defined. I’ve seen many files where that information is 1000s of lines apart. The other thing is that it is hard to tell what parts are the JSON Schema language words and what parts are the data you are defining. SchemaType tries to make all this easy, compact and less painful.

This post is an introduction to the language basics. It was intended to raise more questions than it answers. There is much more to cover, but hopefully that’s enough to start you thinking about SchemaType!