Introduction

SimpleBuffers is a schema language and compiler for data serialization. Like protobuf, it is used to generate code in various languages that encodes and decodes data structures in a common format. Unlike protobuf, SimpleBuffers is designed for consistent APIs in resource-constrained environments. It forgoes some backwards-compatibility in order to increase efficiency in both storage density and encoding/decoding speed. In fact, SimpleBuffers data can be decoded lazily, often in constant-time. For more information about how this is done, see Serialization Format.

SimpleBuffers has an extremely similar serialization scheme to Cap'n Proto. I made this project independently before investigating Cap'n Proto's inner workings, and while I prefer some aspects of my C++ API, I highly recommend using Cap'n Proto over SimpleBuffers for any serious project. It is more established, more complete, and will be far better supported.

Installation

The SimpleBuffers compiler is distributed as a single executable. Simply download and extract it from the repo's latest release.

Compiler Usage

The SimpleBuffers compiler is invoked from the command line to generate code from your schema files. The basic syntax is as follows:

simplebuffers [options] <generator> <schema_file> [generator-specific arguments]
  • <generator>: Specifies the target language for code generation (e.g., cpp for C++).
  • <schema_file>: Path to your SimpleBuffers schema file.

Options

  • -l, --lib <path>: Specify a custom library to load for third-party generators.
  • -s, --srcdir <path>: Set the directory where your SimpleBuffers schema lives.
  • -d, --dstdir <path>: Set the directory where generated files will be written.

Generator-Specific Arguments

Different code generators may require or accept additional arguments. These are passed after the main options and are specific to the chosen generator. The compiler passes these arguments directly to the selected generator.

Example: Using the C++ Generator

For the C++ generator, you might use a command like this:

simplebuffers -d ./output cpp myschema.sb --header-dir include

In this example:

  • -d ./output specifies the output directory for the generated files
  • cpp is the generator name
  • myschema.sb is the input schema file
  • --header-dir include is a C++ specific argument that determines the destination for generated header files

Note that the exact arguments accepted by the C++ generator may vary. Always refer to the specific generator's documentation for the most up-to-date information on available options.

Output

The compiler will generate language-specific files based on your schema. For C++, this typically includes:

  1. A header file (.hpp) in the specified header directory
  2. A source file (.cpp) in the main output directory
  3. A core library header file (simplebuffers.hpp) in the header directory

These files will contain the necessary classes and functions to serialize and deserialize your data structures according to the SimpleBuffers schema.

Remember to include these generated files in your project and link against them as needed.

Help

For up-to-date information about CLI usage and options, run:

simplebuffers --help

Version Information

You can check the version of the SimpleBuffers compiler by running:

simplebuffers --version

This will display the current version of the compiler.

Schema File Format

The core of SimpleBuffers is the schema file. This contains all of the data structures that can be serialized. SimpleBuffers schemas are stored in files with the .sb extension and are passed to the compiler for code generation.

Let's look at a simple example:

enum RobotJoint {
    j0 = 0;
    j1 = 1;
    j2 = 2;
    j3 = 3;
    j4 = 4;
    j5 = 5;
}

sequence Init {
    expected_firmware: u32;
}

sequence MoveToEntry {
    joint: RobotJoint;
    angle: f32;
    speed: f32;
}

sequence MoveTo {
    joints: [MoveToEntry];
    stop_smoothly: bool;
}

sequence Request {
    id: u32;
    payload: oneof {
        init: Init;
        moveTo: MoveTo;
    };
}

This schema characterizes some functionality for a robot arm. The main data structure is the Request sequence, which contains an ID and a payload, which takes the form of some other sequence. Before we can fully understand what this means, we have to explain some terminology.

Enums

Enums, like in most programming languages, describe a set of finite values. In SimpleBuffers, enums are backed by unsigned integers. Each enumeration must be explicitly assigned to a unique value. Enumerations do not need to be assigned contiguously, as can be seen in the following example:

enum RobotJoint {
    j0 = 0;
    j1 = 1;
    j2 = 2;
    j3 = 3;
    j4 = 4;
    j5 = 5;
    unknown = 255;
}

The size of the backing integer is determined by the possible enumerations. In the above example, RobotJoint will be backed by an 8-bit integer, as all enumerations can fit in it. However, if unknown's value were changed to be 300 instead of 255, all RobotJoint instances would instead be backed by a 16-bit integer as they no longer fit in 8.

Sequences

Sequences are SimpleBuffers' equivalent to structs. Importantly, sequences are ordered; changing the order of a sequence's fields will cause the serialization format to change. Semicolons are required after every field.

sequence MoveToEntry {
    joint: RobotJoint;
    angle: f32;
    speed: f32;
}

Every field of a sequence (or oneof) must be annotated with a type. A type can be one of the following:

  • Primitive
  • List
  • Enum
  • Sequence
  • Oneof

Primitive Types

SimpleBuffers contains the following primitive types:

TypeDescription
u8An unsigned, 8-bit integer
u16An unsigned, 16-bit integer
u32An unsigned, 32-bit integer
u64An unsigned, 64-bit integer
i8A signed, 8-bit integer
i16A signed, 16-bit integer
i32A signed, 32-bit integer
i64A signed, 64-bit integer
f32A 32-bit floating point
f64A 64-bit floating point
boolA boolean value (8-bit)
strA string

Note that, unlike the rest of the primitive types, strings are variable-sized fields. This entails a small amount of additional overhead which is explained further in Serialization Format.

Lists

Like strings, lists are variable-sized. See Serialization Format for more information about the implications of this.

Lists are denoted by surrounding a type in square brackets. For example:

sequence MoveTo {
    joints: [MoveToEntry];
    stop_smoothly: bool;
}

The joints field is an array of MoveToEntry sequences.

OneOf

Like a union in C, a oneof allows a single field to have multiple possible data types. In our example, Request uses a oneof for the payload field. While the syntax looks similar to a sequence, a oneof can only store a single value at a time.

sequence Request {
    id: u32;
    payload: oneof {
        init: Init;
        moveTo: MoveTo;
    };
}

Multiple oneof fields may be of the same type. This can be useful for readability and clarity, e.g.:

sequence LoginInfo {
    user: oneof {
        email: str;
        phone_num: str;
        username: str;
    };
}

Comments

SimpleBuffers uses C-style single-line comments denoted by //. Multiline comments are not supported.

// I am a comment
sequence MySequence {
    my_field: u8; // This is my field whom I love very much
}

Generated C++ API

The SimpleBuffers compiler generates C++ code that provides a convenient API for serializing and deserializing data structures defined in the schema. This section describes the main components of the generated API and how to use them.

Writers

For each sequence defined in the schema, the compiler generates a corresponding Writer class. These classes are used to construct and serialize data.

Sequence Writers

For example, given the Request sequence from our schema:

sequence Request {
    id: u32;
    payload: oneof {
        init: Init;
        moveTo: MoveTo;
    };
}

The compiler generates a RequestWriter class:

class RequestWriter : public simplebuffers::SimpleBufferWriter {
public:
    RequestWriter(uint32_t id, PayloadWriter payload);

    uint32_t id;
    PayloadWriter payload;

    uint16_t static_size() const override;
    uint8_t* write_component(uint8_t* dest, const uint8_t* dest_end, uint8_t* dyn_cursor) const override;
};

To create and serialize a Request:

InitWriter init_payload(firmware_version);
RequestWriter::PayloadWriter payload = RequestWriter::PayloadWriter::init(&init_payload);
RequestWriter request(request_id, payload);

uint8_t buffer[1024];
int32_t bytes_written = request.write(buffer, sizeof(buffer));

OneOf Writers

For OneOf fields, the compiler generates nested classes. In the Request example, there's a PayloadWriter nested class:

class RequestWriter::PayloadWriter : public simplebuffers::OneOfWriter {
public:
    enum class Tag : uint8_t {
        INIT = 0,
        MOVE_TO = 1
    };

    static PayloadWriter init(InitWriter* val);
    static PayloadWriter move_to(MoveToWriter* val);

    // ... other methods ...
};

List Writers

For list fields, the compiler generates a ListWriter specialization:

class MoveToWriter : public simplebuffers::SimpleBufferWriter {
public:
    MoveToWriter(simplebuffers::ListWriter<MoveToEntryWriter> joints);

    simplebuffers::ListWriter<MoveToEntryWriter> joints;

    // ... other methods ...
};

To create a list:

std::vector<MoveToEntryWriter> entries = { /* ... */ };
simplebuffers::ListWriter<MoveToEntryWriter> joints_list(entries.data(), entries.size());
MoveToWriter move_to(joints_list);

Readers

For each sequence, the compiler also generates a corresponding Reader class for deserialization.

Sequence Readers

Continuing with the Request example:

class RequestReader : public simplebuffers::SimpleBufferReader {
public:
    RequestReader(const uint8_t* data_ptr, size_t idx = 0);

    uint32_t id() const;
    PayloadReader payload() const;

    // ... other methods ...
};

To read a serialized Request:

RequestReader reader(buffer);
uint32_t id = reader.id();
RequestReader::PayloadReader payload = reader.payload();

OneOf Readers

For OneOf fields, the compiler generates nested reader classes:

class RequestReader::PayloadReader : public simplebuffers::OneOfReader {
public:
    enum class Tag : uint8_t {
        INIT = 0,
        MOVE_TO = 1
    };

    PayloadReader(const uint8_t* data_ptr, size_t idx = 0);
    Tag tag() const;
    InitReader init() const;
    MoveToReader move_to() const;

    // ... other methods ...
};

To read a OneOf field:

RequestReader::PayloadReader payload = reader.payload();
switch (payload.tag()) {
    case RequestReader::PayloadReader::Tag::INIT:
        InitReader init = payload.init();
        // Process init...
        break;
    case RequestReader::PayloadReader::Tag::MOVE_TO:
        MoveToReader move_to = payload.move_to();
        // Process move_to...
        break;
}

List Readers

For list fields, the compiler generates a ListReader specialization:

class MoveToReader : public simplebuffers::SimpleBufferReader {
public:
    MoveToReader(const uint8_t* data_ptr, size_t idx = 0);
    simplebuffers::ListReader<MoveToEntryReader> joints() const;

    // ... other methods ...
};

To read a list:

MoveToReader move_to_reader = payload.move_to();
auto joints = move_to_reader.joints();
for (uint16_t i = 0; i < joints.len(); ++i) {
    MoveToEntryReader entry = joints[i];
    // Process entry...
}

Enums

For each enum defined in the schema, the compiler generates a corresponding C++ enum class:

enum class RobotJoint : uint_fast8_t {
    J_0 = 0,
    J_1 = 1,
    J_2 = 2,
    J_3 = 3,
    J_4 = 4,
    J_5 = 5
};

These enum classes can be used directly in your C++ code and are automatically handled by the generated Writer and Reader classes.

This API design allows for efficient serialization and deserialization of data structures defined in the SimpleBuffers schema, with a focus on performance and ease of use in C++ applications.

Optimized Binary Data Serialization

SimpleBuffers provides special optimizations for handling lists of uint8_t, which is particularly useful for sending raw binary data. This optimization uses memcpy to efficiently copy the entire list, resulting in improved performance for large binary payloads.

Writing Raw Binary Data

Let's extend our example schema to include a sequence for sending raw binary data:

sequence BinaryPayload {
    data: [u8];
    description: str;
}

The generated C++ code for this sequence would include:

class BinaryPayloadWriter : public simplebuffers::SimpleBufferWriter {
public:
    BinaryPayloadWriter(simplebuffers::ListWriter<uint8_t> data, const char* description);

    simplebuffers::ListWriter<uint8_t> data;
    const char* description;

    uint16_t static_size() const override;
    uint8_t* write_component(uint8_t* dest, const uint8_t* dest_end, uint8_t* dyn_cursor) const override;
};

To write a BinaryPayload with raw binary data:

// Prepare your raw binary data
std::vector<uint8_t> raw_data = {0x01, 0x02, 0x03, 0x04, 0x05};  // Example data

// Create a ListWriter for the raw data
simplebuffers::ListWriter<uint8_t> data_list(raw_data.data(), raw_data.size());

// Create the BinaryPayloadWriter
const char* description = "Example binary payload";
BinaryPayloadWriter payload(data_list, description);

// Serialize the payload
uint8_t buffer[1024];
int32_t bytes_written = payload.write(buffer, sizeof(buffer));

if (bytes_written > 0) {
    std::cout << "Binary payload serialized successfully. Bytes written: " << bytes_written << std::endl;
} else {
    std::cerr << "Failed to serialize binary payload." << std::endl;
}

Reading Raw Binary Data

The corresponding reader for BinaryPayload would look like this:

class BinaryPayloadReader : public simplebuffers::SimpleBufferReader {
public:
    BinaryPayloadReader(const uint8_t* data_ptr, size_t idx = 0);

    simplebuffers::ListReader<uint8_t> data() const;
    const char* description() const;

    uint16_t static_size() const override;
};

To read the serialized BinaryPayload:

BinaryPayloadReader reader(buffer);

// Access the raw binary data
auto data = reader.data();
std::cout << "Raw data size: " << data.len() << " bytes" << std::endl;

// You can access individual bytes if needed
for (uint16_t i = 0; i < data.len(); ++i) {
    std::cout << "Byte " << i << ": 0x" << std::hex << static_cast<int>(data[i]) << std::dec << std::endl;
}

// Or you can work with the entire data buffer directly
const uint8_t* raw_data_ptr = data.data();
size_t raw_data_size = data.len();

// Access the description
std::cout << "Description: " << reader.description() << std::endl;

The ListReader<uint8_t> provides a data() method that returns a pointer to the raw data buffer, allowing for efficient access to the entire binary payload without copying.

This optimized handling of uint8_t lists allows SimpleBuffers to efficiently serialize and deserialize raw binary data, making it suitable for applications that need to transmit or store binary blobs alongside structured data.

Serialization Format

SimpleBuffers is designed to encode simple, stable data schemas as efficiently as possible. Fixed-size data is packed optimally with no padding, labels, or any other metadata. Variable-sized data structures such as lists and strings require a small amount of additional data (by default, two bytes). This is explained more below.

Note that data is serialized into little-endian format, as this is natively supported by practically all modern processors, allowing for efficient decoding in almost all scenarios.

Take the following example schema:

enum RobotJoint {
    j0 = 0;
    j1 = 1;
    j2 = 2;
    j3 = 3;
    j4 = 4;
    j5 = 5;
}

sequence Init {
    expected_firmware: u32;
}

sequence MoveToEntry {
    joint: RobotJoint;
    angle: f32;
    speed: f32;
}

sequence MoveTo {
    joints: [MoveToEntry];
    stop_smoothly: bool;
}

sequence Request {
    id: u32;
    payload: oneof {
        init: Init;
        moveTo: MoveTo;
    };
}

This schema represents a simple serial protocol that can be used to control a robot arm. Let's go through it step-by-step.

Enums

Every element of an enum must be explicitly assigned to a number. When enums are serialized, the appropriate number is written into the buffer, which can be decoded back into an enum later. Enums will always use the smallest possible data type that can fully represent them. Most enums, including RobotJoint, are encoded to a single octet.

enum BigEnum {
    element_a = 0;
    element_b = 1;
    element_c = 1000;
}

The above BigEnum will be serialized as a 16-bit value because element_c cannot fit within an octet. Note that this is true even if the value being serialized is element_a or element_b; the size of an enum is fixed.

Fixed-Sized Sequences

Next, let's take a look at our Init sequence:

sequence Init {
    expected_firmware: u32;
}

It only has a single value: expected_firmware, which is a 32-bit unsigned integer. Sequences induce zero overhead. This means that the size of Init is exactly equal to the sum of the sizes of its elements. Init, therefore, will always use 32 bits.

MoveToEntry also only includes fixed-size elements:

sequence MoveToEntry {
    joint: RobotJoint;
    angle: f32;
    speed: f32;
}

angle and speed are both 32-bit floats, and joint is an enum. In this case, RobotJoint fits into a single octet, so MoveToEntry uses \(32 + 32 + 8 = 72\) bits. The actual serialization of a MoveEntry would look like this:

block-beta
    columns 3

    block:raw:3
        rawjoint["joint = j1"]
        rawangle["angle = 45"]
        rawspeed["speed = 100"]
    end
    space:3
    block:ser:3
        serjoint["0x01"]
        serangle["0x42340000"]
        serspeed["0x42c80000"]
    end
    space:1 down<[" "]>(down):1 space:1
    block:final:3
        final["01 42 34 00 00 42 c8 00 00"]:3
    end

    rawjoint-->serjoint
    rawangle-->serangle
    rawspeed-->serspeed

Fixed-size sequences are great. They are not only 100% data-efficient, but they also provide constant-time access to any element, no matter how deeply nested. This is true because the positions of all elements are known at compile-time and can be baked into the generated code. However, some types of data do not have a set size. This data must be encoded differently.

Lists

Lists consist of a variable number of repeated data. Because we do not know the length of the list at compile-time, we cannot allocate fixed-size field in a sequence. Take a look at MoveTo:

sequence MoveTo {
    joints: [MoveToEntry];
    stop_smoothly: bool;
}

We know the size of stop_smoothly, but joints could have any number of elements. This is a problem because now we cannot know the position of stop_smoothly at compile-time; it will change depending on the length of joints.

block-beta
    columns 2

    block:raw:2
        rawjoints["joints = [...]"]
        rawstop["stop_smoothly = true"]
    end
    space:2
    block:ser:2
        serjoints["???"]
        serstop["0x01"]
    end
    down<[" "]>(down):2
    block:final:2
        final["??? 01"]:2
    end

    rawjoints-->serjoints
    rawstop-->serstop

To solve this, we must find a way to force joints to be a fixed size. Fortunately, a solution already exists: pointers. Instead of storing the full list in joints, we can instead store a fixed-size pointer and place the list at the end of the buffer where it can no longer hurt us. In practice, we prefer to store a relative offset rather than an absolute pointer, as this allows complex sequences to be decomposed more effectively.

We must also encode the size of the list. This is done in the fixed-sized segment, which allows access without indirection. Both the offset and the list size are stored as unsigned 16-bit integers.

block-beta
    columns 4

    block:raw:3
        rawjointslen["joints size = 3"]
        rawjoints["joints = [...]"]
        rawstop["stop_smoothly = true"]
    end
    space
    space:4
    block:serstatic:3
        serjointslen["0x0003"]
        serjointsptr["0x0003 (offset)"]
        serstop["0x01"]
    end
    block:serdyn:1
        serjointsdata["..."]
    end
    space:1 down<[" "]>(down):2 space:1
    block:final:4
        final["03 00 03 00 01 ..."]:4
    end

    rawjoints-->serjointsdata
    rawjointslen-->serjointslen
    rawstop-->serstop

Strings

Unlike lists, strings must be null-terminated. This means that we do not have to store the size of the string. Otherwise, they are identical.

Oneofs

There are two possible ways to implement the oneof: as a fixed-sized union or a dynamically-sized pointer. The main drawback of a union is the space requirements. Like an enum, a union must always be as large as the largest possible member. Unlike an enum, however, union members are expected to sometimes be vastly different sizes. This leads to increased storage inefficiency in all but the best case. For this reason, SimpleBuffers instead implements the oneof as a dynamically-sized structure.

Like a list, the oneof stores two values in the fixed-size segment of the buffer: the type of data being stored, and an offset to the data. The data type is stored as a single octet. Oneofs with more than 255 members are unsupported.

Let's take a look at how a Request with an Init payload would be serialized:

sequence Request {
    id: u32;            // <-- 0
    payload: oneof {
        init: Init;     // <-- .expected_firmware = 3
        moveTo: MoveTo;
    };
}
block-beta
    columns 4

    block:braw:4
        rawid["id = 0"]
        rawpayloadtype["payload type = init"]
        rawfw["payload.expected_firmware = 3"]
    end
    space:4
    block:bser0:3
        serid["0x00000000"]
        serpayloadtype["0x00"]
        serpayloadoffset["0x0002 (offset)"]
    end
    block:serdyn:1
        serfw["0x03"]
    end
    space:1 down<[" "]>(down):2 space:1
    block:final:4
        final["00 00 00 00 02 00 03"]:4
    end

    rawid-->serid
    rawpayloadtype-->serpayloadtype
    rawfw-->serfw

Now, we look at the other oneof case: a MoveTo payload. At first, this appears slightly more complicated, as MoveTo requires its own dynamic sizing. In practice, however, it is fairly simple.

When serializing data, a cursor is placed in the destination buffer at the end of the fixed-sized data. Every piece of dynamic data is placed at the cursor position, and the cursor is incremented to the end of the new data.

sequence Request {
    id: u32;            // <-- 1
    payload: oneof {
        init: Init;
        moveTo: MoveTo; // <-- .joints size = 3, .stop_smoothly = true, .joints = ...
    };
}
block-beta
    columns 5

    block:braw:5
        rawid["id = 1"]
        rawpayloadtype["payload type = moveto"]
        rawjointslen["payload.joints size = 3"]
        rawjoints["payload.joints = [...]"]
        rawstop["payload.stop_smoothly = true"]
    end
    space:5
    block:bser:3
        serid["0x00000001"]
        serpayloadtype["0x01"]
        serpayloadoffset["0x0002 (offset)"]
    end
    space:2
    block:bdyn0:3
        serjointslen["0x03"]
        serjointsptr["0x0003 (offset)"]
        serstop["0x01"]
    end
    block:serdyn1:1
        serjoints["..."]
    end
    space:1
    space:2 down<[" "]>(down) space:2
    block:final:5
        final["01 00 00 00 01 02 00 03 03 00 01 ..."]:5
    end

    rawid-->serid
    rawpayloadtype-->serpayloadtype
    rawjointslen-->serjointslen
    rawjoints-->serjoints
    rawstop-->serstop