Introduction
SimpleBuffers is a schema language and compiler for data serialization. Like protobuf, it is used to generate code in various languages that encodes and decodes data structures in a common format. Unlike protobuf, SimpleBuffers is designed for consistent APIs in resource-constrained environments. It forgoes some backwards-compatibility in order to increase efficiency in both storage density and encoding/decoding speed. In fact, SimpleBuffers data can be decoded lazily, often in constant-time. For more information about how this is done, see Serialization Format.
SimpleBuffers has an extremely similar serialization scheme to Cap'n Proto. I made this project independently before investigating Cap'n Proto's inner workings, and while I prefer some aspects of my C++ API, I highly recommend using Cap'n Proto over SimpleBuffers for any serious project. It is more established, more complete, and will be far better supported.
Installation
The SimpleBuffers compiler is distributed as a single executable. Simply download and extract it from the repo's latest release.
Compiler Usage
The SimpleBuffers compiler is invoked from the command line to generate code from your schema files. The basic syntax is as follows:
simplebuffers [options] <generator> <schema_file> [generator-specific arguments]
<generator>
: Specifies the target language for code generation (e.g., cpp for C++).<schema_file>
: Path to your SimpleBuffers schema file.
Options
-l, --lib <path>
: Specify a custom library to load for third-party generators.-s, --srcdir <path>
: Set the directory where your SimpleBuffers schema lives.-d, --dstdir <path>
: Set the directory where generated files will be written.
Generator-Specific Arguments
Different code generators may require or accept additional arguments. These are passed after the main options and are specific to the chosen generator. The compiler passes these arguments directly to the selected generator.
Example: Using the C++ Generator
For the C++ generator, you might use a command like this:
simplebuffers -d ./output cpp myschema.sb --header-dir include
In this example:
-d ./output
specifies the output directory for the generated filescpp
is the generator namemyschema.sb
is the input schema file--header-dir include
is a C++ specific argument that determines the destination for generated header files
Note that the exact arguments accepted by the C++ generator may vary. Always refer to the specific generator's documentation for the most up-to-date information on available options.
Output
The compiler will generate language-specific files based on your schema. For C++, this typically includes:
- A header file (.hpp) in the specified header directory
- A source file (.cpp) in the main output directory
- A core library header file (simplebuffers.hpp) in the header directory
These files will contain the necessary classes and functions to serialize and deserialize your data structures according to the SimpleBuffers schema.
Remember to include these generated files in your project and link against them as needed.
Help
For up-to-date information about CLI usage and options, run:
simplebuffers --help
Version Information
You can check the version of the SimpleBuffers compiler by running:
simplebuffers --version
This will display the current version of the compiler.
Schema File Format
The core of SimpleBuffers is the schema file. This contains all of the data structures that can be
serialized. SimpleBuffers schemas are stored in files with the .sb
extension and are passed to the
compiler for code generation.
Let's look at a simple example:
enum RobotJoint {
j0 = 0;
j1 = 1;
j2 = 2;
j3 = 3;
j4 = 4;
j5 = 5;
}
sequence Init {
expected_firmware: u32;
}
sequence MoveToEntry {
joint: RobotJoint;
angle: f32;
speed: f32;
}
sequence MoveTo {
joints: [MoveToEntry];
stop_smoothly: bool;
}
sequence Request {
id: u32;
payload: oneof {
init: Init;
moveTo: MoveTo;
};
}
This schema characterizes some functionality for a robot arm. The main data structure is the
Request
sequence, which contains an ID and a payload, which takes the form of some other sequence.
Before we can fully understand what this means, we have to explain some terminology.
Enums
Enums, like in most programming languages, describe a set of finite values. In SimpleBuffers, enums are backed by unsigned integers. Each enumeration must be explicitly assigned to a unique value. Enumerations do not need to be assigned contiguously, as can be seen in the following example:
enum RobotJoint {
j0 = 0;
j1 = 1;
j2 = 2;
j3 = 3;
j4 = 4;
j5 = 5;
unknown = 255;
}
The size of the backing integer is determined by the possible enumerations. In the above example,
RobotJoint
will be backed by an 8-bit integer, as all enumerations can fit in it. However, if
unknown
's value were changed to be 300
instead of 255
, all RobotJoint
instances would
instead be backed by a 16-bit integer as they no longer fit in 8.
Sequences
Sequences are SimpleBuffers' equivalent to structs. Importantly, sequences are ordered; changing the order of a sequence's fields will cause the serialization format to change. Semicolons are required after every field.
sequence MoveToEntry {
joint: RobotJoint;
angle: f32;
speed: f32;
}
Every field of a sequence (or oneof) must be annotated with a type. A type can be one of the following:
- Primitive
- List
- Enum
- Sequence
- Oneof
Primitive Types
SimpleBuffers contains the following primitive types:
Type | Description |
---|---|
u8 | An unsigned, 8-bit integer |
u16 | An unsigned, 16-bit integer |
u32 | An unsigned, 32-bit integer |
u64 | An unsigned, 64-bit integer |
i8 | A signed, 8-bit integer |
i16 | A signed, 16-bit integer |
i32 | A signed, 32-bit integer |
i64 | A signed, 64-bit integer |
f32 | A 32-bit floating point |
f64 | A 64-bit floating point |
bool | A boolean value (8-bit) |
str | A string |
Note that, unlike the rest of the primitive types, strings are variable-sized fields. This entails a small amount of additional overhead which is explained further in Serialization Format.
Lists
Like strings, lists are variable-sized. See Serialization Format for more information about the implications of this.
Lists are denoted by surrounding a type in square brackets. For example:
sequence MoveTo {
joints: [MoveToEntry];
stop_smoothly: bool;
}
The joints
field is an array of MoveToEntry
sequences.
OneOf
Like a union in C, a oneof allows a single field to have multiple possible data types. In our
example, Request
uses a oneof for the payload
field. While the syntax looks similar to a
sequence, a oneof can only store a single value at a time.
sequence Request {
id: u32;
payload: oneof {
init: Init;
moveTo: MoveTo;
};
}
Multiple oneof fields may be of the same type. This can be useful for readability and clarity, e.g.:
sequence LoginInfo {
user: oneof {
email: str;
phone_num: str;
username: str;
};
}
Comments
SimpleBuffers uses C-style single-line comments denoted by //
. Multiline comments are not
supported.
// I am a comment
sequence MySequence {
my_field: u8; // This is my field whom I love very much
}
Generated C++ API
The SimpleBuffers compiler generates C++ code that provides a convenient API for serializing and deserializing data structures defined in the schema. This section describes the main components of the generated API and how to use them.
Writers
For each sequence defined in the schema, the compiler generates a corresponding Writer
class.
These classes are used to construct and serialize data.
Sequence Writers
For example, given the Request
sequence from our schema:
sequence Request {
id: u32;
payload: oneof {
init: Init;
moveTo: MoveTo;
};
}
The compiler generates a RequestWriter
class:
class RequestWriter : public simplebuffers::SimpleBufferWriter {
public:
RequestWriter(uint32_t id, PayloadWriter payload);
uint32_t id;
PayloadWriter payload;
uint16_t static_size() const override;
uint8_t* write_component(uint8_t* dest, const uint8_t* dest_end, uint8_t* dyn_cursor) const override;
};
To create and serialize a Request
:
InitWriter init_payload(firmware_version);
RequestWriter::PayloadWriter payload = RequestWriter::PayloadWriter::init(&init_payload);
RequestWriter request(request_id, payload);
uint8_t buffer[1024];
int32_t bytes_written = request.write(buffer, sizeof(buffer));
OneOf Writers
For OneOf fields, the compiler generates nested classes. In the Request
example, there's a
PayloadWriter
nested class:
class RequestWriter::PayloadWriter : public simplebuffers::OneOfWriter {
public:
enum class Tag : uint8_t {
INIT = 0,
MOVE_TO = 1
};
static PayloadWriter init(InitWriter* val);
static PayloadWriter move_to(MoveToWriter* val);
// ... other methods ...
};
List Writers
For list fields, the compiler generates a ListWriter
specialization:
class MoveToWriter : public simplebuffers::SimpleBufferWriter {
public:
MoveToWriter(simplebuffers::ListWriter<MoveToEntryWriter> joints);
simplebuffers::ListWriter<MoveToEntryWriter> joints;
// ... other methods ...
};
To create a list:
std::vector<MoveToEntryWriter> entries = { /* ... */ };
simplebuffers::ListWriter<MoveToEntryWriter> joints_list(entries.data(), entries.size());
MoveToWriter move_to(joints_list);
Readers
For each sequence, the compiler also generates a corresponding Reader
class for deserialization.
Sequence Readers
Continuing with the Request
example:
class RequestReader : public simplebuffers::SimpleBufferReader {
public:
RequestReader(const uint8_t* data_ptr, size_t idx = 0);
uint32_t id() const;
PayloadReader payload() const;
// ... other methods ...
};
To read a serialized Request
:
RequestReader reader(buffer);
uint32_t id = reader.id();
RequestReader::PayloadReader payload = reader.payload();
OneOf Readers
For OneOf fields, the compiler generates nested reader classes:
class RequestReader::PayloadReader : public simplebuffers::OneOfReader {
public:
enum class Tag : uint8_t {
INIT = 0,
MOVE_TO = 1
};
PayloadReader(const uint8_t* data_ptr, size_t idx = 0);
Tag tag() const;
InitReader init() const;
MoveToReader move_to() const;
// ... other methods ...
};
To read a OneOf field:
RequestReader::PayloadReader payload = reader.payload();
switch (payload.tag()) {
case RequestReader::PayloadReader::Tag::INIT:
InitReader init = payload.init();
// Process init...
break;
case RequestReader::PayloadReader::Tag::MOVE_TO:
MoveToReader move_to = payload.move_to();
// Process move_to...
break;
}
List Readers
For list fields, the compiler generates a ListReader
specialization:
class MoveToReader : public simplebuffers::SimpleBufferReader {
public:
MoveToReader(const uint8_t* data_ptr, size_t idx = 0);
simplebuffers::ListReader<MoveToEntryReader> joints() const;
// ... other methods ...
};
To read a list:
MoveToReader move_to_reader = payload.move_to();
auto joints = move_to_reader.joints();
for (uint16_t i = 0; i < joints.len(); ++i) {
MoveToEntryReader entry = joints[i];
// Process entry...
}
Enums
For each enum defined in the schema, the compiler generates a corresponding C++ enum class:
enum class RobotJoint : uint_fast8_t {
J_0 = 0,
J_1 = 1,
J_2 = 2,
J_3 = 3,
J_4 = 4,
J_5 = 5
};
These enum classes can be used directly in your C++ code and are automatically handled by the generated Writer and Reader classes.
This API design allows for efficient serialization and deserialization of data structures defined in the SimpleBuffers schema, with a focus on performance and ease of use in C++ applications.
Optimized Binary Data Serialization
SimpleBuffers provides special optimizations for handling lists of uint8_t, which is particularly useful for sending raw binary data. This optimization uses memcpy to efficiently copy the entire list, resulting in improved performance for large binary payloads.
Writing Raw Binary Data
Let's extend our example schema to include a sequence for sending raw binary data:
sequence BinaryPayload {
data: [u8];
description: str;
}
The generated C++ code for this sequence would include:
class BinaryPayloadWriter : public simplebuffers::SimpleBufferWriter {
public:
BinaryPayloadWriter(simplebuffers::ListWriter<uint8_t> data, const char* description);
simplebuffers::ListWriter<uint8_t> data;
const char* description;
uint16_t static_size() const override;
uint8_t* write_component(uint8_t* dest, const uint8_t* dest_end, uint8_t* dyn_cursor) const override;
};
To write a BinaryPayload
with raw binary data:
// Prepare your raw binary data
std::vector<uint8_t> raw_data = {0x01, 0x02, 0x03, 0x04, 0x05}; // Example data
// Create a ListWriter for the raw data
simplebuffers::ListWriter<uint8_t> data_list(raw_data.data(), raw_data.size());
// Create the BinaryPayloadWriter
const char* description = "Example binary payload";
BinaryPayloadWriter payload(data_list, description);
// Serialize the payload
uint8_t buffer[1024];
int32_t bytes_written = payload.write(buffer, sizeof(buffer));
if (bytes_written > 0) {
std::cout << "Binary payload serialized successfully. Bytes written: " << bytes_written << std::endl;
} else {
std::cerr << "Failed to serialize binary payload." << std::endl;
}
Reading Raw Binary Data
The corresponding reader for BinaryPayload
would look like this:
class BinaryPayloadReader : public simplebuffers::SimpleBufferReader {
public:
BinaryPayloadReader(const uint8_t* data_ptr, size_t idx = 0);
simplebuffers::ListReader<uint8_t> data() const;
const char* description() const;
uint16_t static_size() const override;
};
To read the serialized BinaryPayload
:
BinaryPayloadReader reader(buffer);
// Access the raw binary data
auto data = reader.data();
std::cout << "Raw data size: " << data.len() << " bytes" << std::endl;
// You can access individual bytes if needed
for (uint16_t i = 0; i < data.len(); ++i) {
std::cout << "Byte " << i << ": 0x" << std::hex << static_cast<int>(data[i]) << std::dec << std::endl;
}
// Or you can work with the entire data buffer directly
const uint8_t* raw_data_ptr = data.data();
size_t raw_data_size = data.len();
// Access the description
std::cout << "Description: " << reader.description() << std::endl;
The ListReader<uint8_t>
provides a data()
method that returns a pointer to the raw data buffer,
allowing for efficient access to the entire binary payload without copying.
This optimized handling of uint8_t lists allows SimpleBuffers to efficiently serialize and deserialize raw binary data, making it suitable for applications that need to transmit or store binary blobs alongside structured data.
Serialization Format
SimpleBuffers is designed to encode simple, stable data schemas as efficiently as possible. Fixed-size data is packed optimally with no padding, labels, or any other metadata. Variable-sized data structures such as lists and strings require a small amount of additional data (by default, two bytes). This is explained more below.
Note that data is serialized into little-endian format, as this is natively supported by practically all modern processors, allowing for efficient decoding in almost all scenarios.
Take the following example schema:
enum RobotJoint {
j0 = 0;
j1 = 1;
j2 = 2;
j3 = 3;
j4 = 4;
j5 = 5;
}
sequence Init {
expected_firmware: u32;
}
sequence MoveToEntry {
joint: RobotJoint;
angle: f32;
speed: f32;
}
sequence MoveTo {
joints: [MoveToEntry];
stop_smoothly: bool;
}
sequence Request {
id: u32;
payload: oneof {
init: Init;
moveTo: MoveTo;
};
}
This schema represents a simple serial protocol that can be used to control a robot arm. Let's go through it step-by-step.
Enums
Every element of an enum must be explicitly assigned to a number. When enums are serialized, the
appropriate number is written into the buffer, which can be decoded back into an enum later. Enums
will always use the smallest possible data type that can fully represent them. Most enums, including
RobotJoint
, are encoded to a single octet.
enum BigEnum {
element_a = 0;
element_b = 1;
element_c = 1000;
}
The above BigEnum
will be serialized as a 16-bit value because element_c
cannot fit within an
octet. Note that this is true even if the value being serialized is element_a
or element_b
; the
size of an enum is fixed.
Fixed-Sized Sequences
Next, let's take a look at our Init
sequence:
sequence Init {
expected_firmware: u32;
}
It only has a single value: expected_firmware
, which is a 32-bit unsigned integer. Sequences
induce zero overhead. This means that the size of Init
is exactly equal to the sum of the sizes of
its elements. Init
, therefore, will always use 32 bits.
MoveToEntry
also only includes fixed-size elements:
sequence MoveToEntry {
joint: RobotJoint;
angle: f32;
speed: f32;
}
angle
and speed
are both 32-bit floats, and joint
is an enum. In this case, RobotJoint
fits
into a single octet, so MoveToEntry
uses \(32 + 32 + 8 = 72\) bits. The actual serialization of
a MoveEntry
would look like this:
block-beta columns 3 block:raw:3 rawjoint["joint = j1"] rawangle["angle = 45"] rawspeed["speed = 100"] end space:3 block:ser:3 serjoint["0x01"] serangle["0x42340000"] serspeed["0x42c80000"] end space:1 down<[" "]>(down):1 space:1 block:final:3 final["01 42 34 00 00 42 c8 00 00"]:3 end rawjoint-->serjoint rawangle-->serangle rawspeed-->serspeed
Fixed-size sequences are great. They are not only 100% data-efficient, but they also provide constant-time access to any element, no matter how deeply nested. This is true because the positions of all elements are known at compile-time and can be baked into the generated code. However, some types of data do not have a set size. This data must be encoded differently.
Lists
Lists consist of a variable number of repeated data. Because we do not know the length of the list
at compile-time, we cannot allocate fixed-size field in a sequence. Take a look at MoveTo
:
sequence MoveTo {
joints: [MoveToEntry];
stop_smoothly: bool;
}
We know the size of stop_smoothly
, but joints
could have any number of elements. This is a
problem because now we cannot know the position of stop_smoothly
at compile-time; it will change
depending on the length of joints
.
block-beta columns 2 block:raw:2 rawjoints["joints = [...]"] rawstop["stop_smoothly = true"] end space:2 block:ser:2 serjoints["???"] serstop["0x01"] end down<[" "]>(down):2 block:final:2 final["??? 01"]:2 end rawjoints-->serjoints rawstop-->serstop
To solve this, we must find a way to force joints
to be a fixed size. Fortunately, a solution
already exists: pointers. Instead of storing the full list in joints
, we can instead store a
fixed-size pointer and place the list at the end of the buffer where it can no longer hurt us. In
practice, we prefer to store a relative offset rather than an absolute pointer, as this allows
complex sequences to be decomposed more effectively.
We must also encode the size of the list. This is done in the fixed-sized segment, which allows access without indirection. Both the offset and the list size are stored as unsigned 16-bit integers.
block-beta columns 4 block:raw:3 rawjointslen["joints size = 3"] rawjoints["joints = [...]"] rawstop["stop_smoothly = true"] end space space:4 block:serstatic:3 serjointslen["0x0003"] serjointsptr["0x0003 (offset)"] serstop["0x01"] end block:serdyn:1 serjointsdata["..."] end space:1 down<[" "]>(down):2 space:1 block:final:4 final["03 00 03 00 01 ..."]:4 end rawjoints-->serjointsdata rawjointslen-->serjointslen rawstop-->serstop
Strings
Unlike lists, strings must be null-terminated. This means that we do not have to store the size of the string. Otherwise, they are identical.
Oneofs
There are two possible ways to implement the oneof: as a fixed-sized union or a dynamically-sized pointer. The main drawback of a union is the space requirements. Like an enum, a union must always be as large as the largest possible member. Unlike an enum, however, union members are expected to sometimes be vastly different sizes. This leads to increased storage inefficiency in all but the best case. For this reason, SimpleBuffers instead implements the oneof as a dynamically-sized structure.
Like a list, the oneof stores two values in the fixed-size segment of the buffer: the type of data being stored, and an offset to the data. The data type is stored as a single octet. Oneofs with more than 255 members are unsupported.
Let's take a look at how a Request
with an Init
payload would be serialized:
sequence Request {
id: u32; // <-- 0
payload: oneof {
init: Init; // <-- .expected_firmware = 3
moveTo: MoveTo;
};
}
block-beta columns 4 block:braw:4 rawid["id = 0"] rawpayloadtype["payload type = init"] rawfw["payload.expected_firmware = 3"] end space:4 block:bser0:3 serid["0x00000000"] serpayloadtype["0x00"] serpayloadoffset["0x0002 (offset)"] end block:serdyn:1 serfw["0x03"] end space:1 down<[" "]>(down):2 space:1 block:final:4 final["00 00 00 00 02 00 03"]:4 end rawid-->serid rawpayloadtype-->serpayloadtype rawfw-->serfw
Now, we look at the other oneof case: a MoveTo
payload. At first, this appears slightly more
complicated, as MoveTo
requires its own dynamic sizing. In practice, however, it is fairly simple.
When serializing data, a cursor is placed in the destination buffer at the end of the fixed-sized data. Every piece of dynamic data is placed at the cursor position, and the cursor is incremented to the end of the new data.
sequence Request {
id: u32; // <-- 1
payload: oneof {
init: Init;
moveTo: MoveTo; // <-- .joints size = 3, .stop_smoothly = true, .joints = ...
};
}
block-beta columns 5 block:braw:5 rawid["id = 1"] rawpayloadtype["payload type = moveto"] rawjointslen["payload.joints size = 3"] rawjoints["payload.joints = [...]"] rawstop["payload.stop_smoothly = true"] end space:5 block:bser:3 serid["0x00000001"] serpayloadtype["0x01"] serpayloadoffset["0x0002 (offset)"] end space:2 block:bdyn0:3 serjointslen["0x03"] serjointsptr["0x0003 (offset)"] serstop["0x01"] end block:serdyn1:1 serjoints["..."] end space:1 space:2 down<[" "]>(down) space:2 block:final:5 final["01 00 00 00 01 02 00 03 03 00 01 ..."]:5 end rawid-->serid rawpayloadtype-->serpayloadtype rawjointslen-->serjointslen rawjoints-->serjoints rawstop-->serstop