Synapse Ingestion Engine
Synapse is a brokerless, edge-first IoT ingestion engine utilizing peer-to-peer validation via Uber's
H3 spatial index and local statistical corroboration.
By replacing centralized message brokers like MQTT or Kafka with direct
ZeroMQ PUB/SUB topologies, it removes central single points of failure,
localizes anomaly detection, and ensures extreme performance under constrained edge computational footprints.
Quickstart
1. Local Deployment (Docker Compose)
The easiest way to execute a full multi-sensor simulation is to spin up the containerized network:
2. Manual Development Setup
If running directly in Python, set up the virtual environment and install the development dependencies:
3. Navigating the Dashboard
Open your browser and navigate to http://localhost:8080. You will be greeted with an interactive geographic visualization rendering hexagonal H3 cells.
- Green Hexagons: All active sensor nodes inside this spatial cell are healthy and report congruent values.
- Yellow/Orange Hexagons: The cell contains anomalous (
FAULTY) or silent (DEAD) sensors. - Red Hexagons: The majority of the sensors in this coordinate cell have failed local verification.
System Architecture
Unlike traditional centralized IoT systems that rely on a large queueing middleware cluster, Synapse decouples ingestion completely into point-to-point zero-overhead networks.
| Feature | Traditional Architectures | Synapse Architecture |
|---|---|---|
| Ingestion Topology | Hub-and-Spoke (Centralized Cloud Broker) | Direct ZeroMQ PUB/SUB links |
| Anomaly Detection | Cloud Data lake / Offline stream analytics | Local / Edge spatial corroboration |
| Resilience & SPOF | Single point of failure at the broker clustering layer | Dynamic autonomous sockets with multi-path recovery |
| Computing Footprint | High (JVM heavy brokers, databases, microservices) | Ultra-lightweight Python/ZeroMQ runtime |
Core Architectural Components
A daemon executing directly on the physical IoT hardware or microcontrollers. It generates unique identifiers, extracts local data readings, maps coordinates to H3 spatial indices (typically resolution 7), and publishes heartbeats over its ZeroMQ PUB socket at periodic intervals.
An async central daemon acting as the spatial ingestion target. It houses the following:
- ZmqListener: A high-throughput async SUB socket that acts as the ingestion boundary.
- NodeRegistry: An in-memory concurrent state tracker that groups nodes by H3 cell and holds the authoritative state.
- GarbageCollector: A background loop cleaning up silent (
DEAD) sensors after timeout limits are crossed. - DashboardServer: A Flask REST API and web application displaying the geographical state map.
Mathematical Formalism
1. Geospatial Mesh via H3 Grid
Geographic coordinates are transformed into a single 64-bit integer H3 cell representing a hexagonal region. The default resolution of 7 covers an average area of 4.3 square kilometers with an edge length of approximately 1.2 kilometers. Peers are defined strictly as nodes occupying the same cell.
2. Pluggable Outlier Detection (Spatial Corroboration)
When a sensor publishes a telemetry ping, the Monitor triggers a Leave-One-Out (LOO) test. For node n reporting value x_n inside cell C, the peer comparison set is defined as:
Classic Z-Score (Method: zscore)
Uses sample mean (μ) and sample standard deviation (σ) computed over the peer set P_n:
Limitation: Non-robust estimators. A single high-variance faulty peer in P_n heavily distorts μ and σ, triggering false negatives or masking other outliers.
Modified Z-Score via Median Absolute Deviation (Method: mad - Default)
Uses the median (m) and the Median Absolute Deviation (MAD) of the peer set, providing a robust estimator unaffected by up to 50% compromised sensors:
Modified Z = 0.6745 × |x_n - m| / MAD
The constant 0.6745 scales the denominator such that the modified Z-score is mathematically equivalent to standard Z-scores when data is normally distributed. If the modified Z exceeds the configured threshold (default 3.5), the node status changes to FAULTY.
API Reference
The monitor exposes a versioned JSON REST API at http://localhost:8080/api/v1/ to query network topology and metrics.
| Endpoint | Method | Response | Description |
|---|---|---|---|
| /api/v1/nodes | GET | Array of Node Objects | List all registered nodes, their coordinates, values, and status. |
| /api/v1/cells | GET | Dictionary of H3 Cells | Cell summary mapping with alive/faulty/dead counts and centroids. |
| /metrics | GET | Prometheus Text Data | Scrapable metrics on rate limits, invalid payloads, and node counts. |
| /live | GET | {"status":"ok"} | Liveness probe. |
| /ready | GET | JSON Status Object | Readiness probe indicating connection counts. |
Example API Node Response
{
"node_id": "sensor_42",
"type": "mock",
"status": "ALIVE",
"last_seen": 1709894982.5,
"h3_cell": "871fb4670ffffff",
"lat": 45.4642,
"lon": 9.1895,
"last_value": 24.5
}
Messaging Protocol
ZeroMQ transmits structured non-blocking telemetry messages using lightweight framing. Sockets connect via the TCP boundary, delivering JSON telemetry payloads.
Payload Structure Schema (PING)
{
"schema_version": 1,
"node_id": "sensor_0001",
"type": "mock",
"timestamp": 1709894981.12,
"status": "PING",
"h3_cell": "871fb4670ffffff",
"payload": {
"value": 22.45,
"lat": 45.4642,
"lon": 9.1895
}
}
node_id, h3_cell, payload.value) are silently dropped at the ZmqListener loop, triggering the Synapse_invalid_payload_total Prometheus metric to alert operations.
Security & Encryption
In adversarial environments, edge telemetry must be protected against eavesdropping and injection attacks. Synapse utilizes CurveZMQ (Elliptic Curve Cryptography based on curve25519) to encrypt the transport layer.
Key generation & Environment Setup
Generate Curve keys and container-specific environment configurations automatically:
allowed_client_publickeys.txt file are rejected at the TCP handshake phase, protecting the ingestion pool.
Rate Limiting via Token-Bucket
To mitigate denial-of-service attempts by faulty or flooded nodes, the registry enforces an optional token-bucket rate limiter on a per-node basis:
- Bucket Capacity: Maximum burst allowance (default 10 tokens).
- Replenish Rate: Rate of token restoration per second (default 1.0).
- Action: Telemetry packets that deplete the node's bucket are dropped.
Development & Testing
Synapse was built following strict test-driven development (TDD) protocols. The entire codebase is heavily verified with deterministic, offline unit tests.
Running the Test Suite
To verify the registry state engine, rate limiters, and corroboration statistics:
tests/test_corroboration.py PASSED [ 52%]
tests/test_rate_limiter.py PASSED [ 78%]
tests/test_http_api.py PASSED [100%]
Linter & Quality Standards
We leverage ruff and mypy for absolute code quality and strict typing checks:
Chaos & Resilience
To validate system behavior under adversarial or chaotic network conditions, Synapse ships with an automated Chaos Monkey agent (tools/chaos_monkey.py).
Failure Simulation Modes
KILL MODE
Randomly terminates running Docker sensor containers (SIGKILL). Validates that the Monitor shifts their status to DEAD upon crossing the timeout threshold.
ANOMALY MODE
Overrides sensor data paths to broadcast high-amplitude outlier telemetry. Tests whether the H3 spatial corroboration moves them to FAULTY.
FLOOD MODE
Floods the ingestion port with malformed payloads or high-velocity packet bursts, testing parser isolation and rate limiters.
Executing Chaos Simulations
Roadmap & Future
Synapse is an evolving research project exploring edge ingestion paradigms.
V1 - PROOF OF CONCEPT
Completed basic Python ingestion, ZeroMQ binding infrastructure, in-memory NodeRegistry, and primitive status updates.
V2 - SPATIAL CORROBORATION & SECURE RUNTIME
Completed H3 geometric indexing support, Median Absolute Deviation (MAD) robust outlier metrics, CurveZMQ encryption, token-bucket rates, and web dashboards.
V3 - WASM PREDICTIVE RUNTIMES (IN DEVELOPMENT)
Migrating outlier and anomaly models directly into WebAssembly (WASM) modules to enable sandboxed cryptographic computation directly on microcontroller layers, signing payloads at the sensor core.