Skip to main content

Avro Schemas

Apache Avro is a compact, row-oriented serialization format designed for high-throughput data systems. MAPS treats Avro as a first-class schema type, with tight integration into the Typed Event pipeline.


1. Format Overview

Avro defines data using a JSON schema and encodes records in a compact binary format.

Key characteristics

  • Schema stored as JSON, data encoded as binary
  • Strong typing with support for:
    • records, arrays, maps
    • enums, unions, fixed, logical types
  • Well-suited for:
    • telemetry streams
    • log/event pipelines
    • long-lived topic-based data with evolution over time

Why use Avro in MAPS?

  • Efficient binary encoding
  • Built-in schema evolution features (defaults, aliases, unions)
  • Good fit for high-volume IoT and analytics streams
  • Plays well with downstream big-data / lake / warehouse tooling

2. SchemaConfig for Avro

All Avro schemas in MAPS are stored as a SchemaConfig:

  • format must be "avro".
  • schema holds the Avro JSON schema.
  • schemaBase64 is typically null for Avro.
  • labels carry routing and discovery metadata (including CoAP interface/resource when exposed over CoAP).

2.1 Required fields for Avro

At the SchemaConfig level:

  • format"avro"
  • name → logical schema name
  • versionId → logical schema version
  • schema → valid Avro JSON schema
  • labels.matchExpression → regex mapping topics to this schema
  • labels.uniqueId → stable schema identifier
  • labels.interface → optional: CoAP if value if exposed via CoAP
  • labels.resource → optional: CoAP rt value if exposed via CoAP

3. Example Avro SchemaConfig (BME688)

Below is an example Avro-based SchemaConfig for the BME688 sensor payload.

{
"versionId": "1",
"name": "BME688-Avro",
"description": "BME688 VOC, pressure, temperature and humidity telemetry (Avro-encoded)",
"labels": {
"comments": "I2C device BME688 VOC, Pressure, Temperature and Humidity Sensor",
"uniqueId": "b1dc43de-4c9b-5d86-9425-cf958eeb598d",
"resource": "sensor",
"interface": "sensor.bme688"
},
"format": "avro",
"schema": {
"type": "record",
"name": "BME688Reading",
"namespace": "io.mapsmessaging.sensors",
"fields": [
{
"name": "temperature",
"type": "double",
"doc": "Unit: °C, range -40.0 to 85.0"
},
{
"name": "humidity",
"type": "double",
"doc": "Unit: %RH, range 10.0 to 90.0"
},
{
"name": "pressure",
"type": "double",
"doc": "Unit: hPa, range 300.0 to 1100.0"
},
{
"name": "gas",
"type": "double",
"doc": "Unit: Ω, range 0.0 to 65535.0"
},
{
"name": "heaterStatus",
"type": "string"
},
{
"name": "gasMode",
"type": "string"
},
{
"name": "dewPoint",
"type": "double",
"doc": "Unit: °C, range -50.0 to 100.0"
},
{
"name": "condensationRisk",
"type": "double",
"doc": "Risk score in [0.0, 1.0]"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "Event time, epoch millis"
}
]
}
}

Notes:

  • The Avro schema sits directly in schema as standard Avro JSON.
  • timestamp uses Avro's logicalType: "timestamp-millis" to align with MAPS' normalised time handling.
  • Ranges and units are carried in the Avro doc field.

4. How MAPS Uses Avro Schemas

At runtime, MAPS:

  1. Resolves the SchemaConfig by topic via matchExpression / bindings.
  2. Loads the Avro JSON schema from schema.
  3. Uses the Avro schema to decode binary Avro payloads into a Typed Event:
    • field names and types come from the Avro schema
    • logical types (like timestamps) are normalised internally
  4. The Typed Event flows through:
    • filtering
    • transformations
    • statistics
    • format conversion (e.g. Avro → JSON / Protobuf / CBC)

Schema evolution rules defined at the Avro level (e.g. added fields with defaults) are respected when decoding.


5. Warnings & Best Practices

  • Keep namespace stable; it forms part of the Avro type identity.
  • Prefer double for sensor telemetry to avoid unnecessary rounding artefacts.
  • Use Avro logical types where appropriate:
    • timestamp-millis / timestamp-micros for event time
    • date for date-only values
  • When changing schemas:
    • add fields with sensible defaults
    • avoid incompatible type changes
    • use aliases when renaming fields
  • Only use schemaBase64 for Avro if you truly need to store a compiled/binary representation; otherwise keep the canonical form as Avro JSON in schema.

6. Example

This java example will load an Avro schema from file and construct a AvroSchemaConfig to then use

public static AvroSchemaConfig getAvroSchema(String name, String title, String description, String matcher, String type) throws IOException {
String schemaFile = "";
File file = new File("./src/main/avro/"+name+".avsc");
try (InputStream is = new FileInputStream(file.getAbsolutePath())) {
schemaFile = new String(is.readAllBytes(), StandardCharsets.UTF_8);
}
UUID schemaId;
try {
schemaId = UuidGenerator.getInstance().generate(NamedVersions.SHA1,uuid, file.getAbsolutePath() );
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
schemaId = UuidGenerator.getInstance().generate();
}
JsonElement element = JsonParser.parseString(schemaFile);
AvroSchemaConfig config = new AvroSchemaConfig();
config.setSchema(element.getAsJsonObject());
config.setComments(description);
config.setTitle(title);
config.setVersion(1);
config.setMatchExpression(matcher);
config.setUniqueId(schemaId);
config.setResourceType(type);
return config;
}

Example of a file called ballast.avsc

{
"type": "record",
"name": "BallastTelemetry",
"namespace": "io.mapsmessaging.ship",
"fields": [
{ "name": "fore_tank_level", "type": "float" },
{ "name": "aft_tank_level", "type": "float" },
{ "name": "stbd_tank_level", "type": "float" },
{ "name": "port_tank_level", "type": "float" }
]
}

Example of a file called cargo.avsc

{
"type": "record",
"name": "CargoMonitorTelemetry",
"namespace": "io.mapsmessaging.ship",
"fields": [
{ "name": "container_temp", "type": "float" },
{ "name": "humidity", "type": "float" },
{ "name": "shock_detected", "type": "boolean" }
]
}

Example of a file called engine-room.avsc

{
"type": "record",
"name": "EngineRoomTelemetry",
"namespace": "io.mapsmessaging.ship",
"fields": [
{ "name": "rpm", "type": "int" },
{ "name": "oil_pressure", "type": "float" },
{ "name": "temperature", "type": "float" }
]
}