Tafy Studio Architecture

Overview

Tafy Studio is a Robot Distributed Operation System (RDOS) built on modern cloud-native technologies adapted for edge robotics. It provides a complete stack from hardware abstraction to visual programming, enabling rapid robot development and deployment.

Core Architecture Principles

Distributed First: Robots are multi-node systems, not monoliths
Message-Driven: All communication via pub/sub and request/reply patterns
Container-Native: Every component runs in containers orchestrated by k3s
Schema-Driven: Strongly typed messages with evolution support
Local-First: Full functionality without internet connectivity

System Layers

1. Infrastructure Layer

Orchestration

k3s: Lightweight Kubernetes for edge devices
Helm: Package management for deploying components
Container Runtime: Multi-architecture support (arm64/amd64)

Networking

mDNS/Avahi: Local device discovery
Traefik: Ingress controller (included with k3s)
CoreDNS: Service discovery within cluster

2. Messaging Layer

NATS Core

Central nervous system for all robot communication
Topics organized by capability: hal.v1.motor.cmd, hal.v1.sensor.range
Request/reply for command/response patterns
Pub/sub for streaming telemetry

NATS JetStream (Phase 3+)

Persistent message streams
KV store for configuration and device registry
Object store for firmware, models, logs

Peer-to-Peer (Future)

libp2p for robot-to-robot communication
NAT traversal for cloud connectivity
Circuit relay for difficult network environments

3. Hardware Abstraction Layer (HAL)

Message Schema

JSON Schema (initially), Protobuf (future)
Versioned schemas with migration support
Self-describing messages with capability declaration

Driver Model

Containerized drivers for Linux devices
Firmware templates for microcontrollers
Automatic capability detection and binding

Standard Capabilities

Motor control (differential, servo, stepper)
Sensors (range, IMU, temperature, GPS)
Cameras (USB, CSI, IP)
Actuators (gripper, LED, relay)

4. Control Plane

Hub Services

Hub UI: Next.js web interface
Hub API: FastAPI backend (BFF pattern)
Device Registry: Tracks all discovered devices
Flow Engine: Node-RED runtime with custom nodes

Node Agent (`tafyd`)

Go service running on each compute node
Handles device discovery and heartbeat
Manages driver containers
Reports telemetry and health

5. User Experience Layer

Visual Programming

Node-RED for flow-based programming
Curated palette of robot-specific nodes
Pre-built flows for common behaviors

Web Technologies

WebSerial for flashing firmware
WebRTC for video streaming
WebSocket for real-time updates

SDKs

TypeScript SDK for web development
Python SDK for research/education
Go SDK for system components

Component Architecture

Monorepo Structure

tafystudio/
├── apps/
│   ├── hub-ui/          # Next.js frontend
│   ├── hub-api/         # FastAPI backend
│   ├── tafyd/           # Go node agent
│   └── bootstrapd/      # Pre-k3s bootstrap
├── packages/
│   ├── sdk-ts/          # TypeScript SDK
│   ├── sdk-python/      # Python SDK
│   ├── hal-schemas/     # Message schemas
│   └── node-red-contrib-tafy/  # Custom nodes
├── firmware/
│   ├── esp32/           # ESP32 templates
│   └── templates/       # Other MCU templates
├── drivers/
│   ├── motor-pwm/       # PWM motor driver
│   ├── camera-usb/      # USB camera driver
│   └── ...              # Additional drivers
└── charts/
    ├── hub/             # Hub Helm chart
    ├── nats/            # NATS configuration
    └── node-red/        # Node-RED deployment

Deployment Architecture

Single Node

All services on one device (Raspberry Pi)
k3s in single-node mode
Suitable for simple robots

Multi-Node Cluster

Hub on primary node
Compute nodes join via k3s token
Automatic workload distribution
GPU nodes for vision/AI workloads

Hybrid Cloud (Future)

NATS leaf nodes for cloud bridge
Remote monitoring and control
Selective data synchronization

Key Design Decisions

Why k3s?

Lightweight (~50MB binary)
Built for edge/ARM devices
Includes batteries (ingress, DNS, load balancer)
Standard Kubernetes API

Why NATS?

Minimal footprint (~15MB)
Built-in clustering
Multiple messaging patterns
Integrated persistence (JetStream)

Why Go?

Single binary deployment
Excellent cross-compilation
Low memory footprint
Strong standard library

Why Node-RED?

Proven visual programming
Extensive node ecosystem
Embeddable runtime
Soft real-time sufficient

Communication Patterns

Device Discovery

MCU powers on → mDNS broadcast → Agent discovers → 
Registers in NATS KV → Hub UI updates → Available in flows

Command/Response

Flow sends command → NATS request → Driver receives →
Executes → Replies with result → Flow continues

Telemetry Streaming

Sensor reads → Driver publishes → NATS topic →
Multiple subscribers → Dashboard, logging, algorithms

Firmware Updates

New firmware available → Hub notifies → User approves →
Download to agent → Flash device → Verify → Report success

Security Model

Network Security

TLS everywhere (NATS, HTTPS)
mTLS for service-to-service
Network isolation via k3s

Access Control

OIDC integration for users
RBAC for permissions
Service accounts for automation

Supply Chain

Signed container images (cosign)
SBOM generation
Dependency scanning

Performance Targets

Latency

Local message passing: <5ms
Command/response: <20ms
Video streaming: <100ms
Suitable for soft real-time control

Scalability

Single node: 10+ drivers
Multi-node: 50+ devices
Message throughput: 10k msgs/sec

Resource Usage

Hub services: <500MB RAM
Node agent: <50MB RAM
NATS: <100MB RAM
Leaves headroom for user workloads

Future Architecture Evolution

Phase 1: MVP

Core messaging and discovery
Basic HAL implementation
Three working demos

Phase 2: Production

Persistent messaging (JetStream)
Multi-node clustering
GPU workload scheduling

Phase 3: Scale

P2P robot swarms
Cloud federation
Advanced AI/ML pipelines

Phase 4: Enterprise

Multi-tenancy
Audit logging
Compliance features

Overview​

Core Architecture Principles​

System Layers​

1. Infrastructure Layer​

Orchestration​

Networking​

2. Messaging Layer​

NATS Core​

NATS JetStream (Phase 3+)​

Peer-to-Peer (Future)​

3. Hardware Abstraction Layer (HAL)​

Message Schema​

Driver Model​

Standard Capabilities​

4. Control Plane​

Hub Services​

Node Agent (tafyd)​

5. User Experience Layer​

Visual Programming​

Web Technologies​

SDKs​

Component Architecture​

Monorepo Structure​

Deployment Architecture​

Single Node​

Multi-Node Cluster​

Hybrid Cloud (Future)​

Key Design Decisions​

Why k3s?​

Why NATS?​

Why Go?​

Why Node-RED?​

Communication Patterns​

Device Discovery​

Command/Response​

Telemetry Streaming​

Firmware Updates​

Security Model​

Network Security​

Access Control​

Supply Chain​

Performance Targets​

Latency​

Scalability​

Resource Usage​

Future Architecture Evolution​

Phase 1: MVP​

Phase 2: Production​

Phase 3: Scale​

Phase 4: Enterprise​