Operational Modes & Deployment Patterns
This page is a technical reference for Amp’s operational modes, components, and deployment patterns. It’s intended for engineers designing, deploying, or operating Amp in development and production environments.
1. Overview
Section titled “1. Overview”Amp provides several core commands that can be composed into different deployment strategies:
server— Query server exposing Arrow Flight and JSON Lines interfacesworker— Executes scheduled extraction jobscontroller— Hosts the Admin API for job and dataset managementmigrate— Applies metadata database migrations
Amp supports two primary operational modes:
- Single-Node Mode — All components run together (local dev/testing)
- Distributed Mode — Components deployed independently (production)
2. Operational Modes
Section titled “2. Operational Modes”2.1 Single-Node Mode
Section titled “2.1 Single-Node Mode”Single-node mode runs all components in one process.
- Server, controller, and worker run together
- Activated via
ampd dev - Optimized for:
- Local development
- CI pipelines
- Quick testing and prototyping
- Not designed for production reliability or fault isolation
2.2 Distributed Mode
Section titled “2.2 Distributed Mode”Distributed mode separates components into independent processes.
- Server, controller, and workers run separately
- Suitable for production deployments
- Enables:
- Horizontal scaling
- Resource isolation (queries vs extraction)
- High availability and failover
3. Core Components
Section titled “3. Core Components”3.1 Server Component
Section titled “3.1 Server Component”Purpose
Section titled “Purpose”- Exposes query interfaces (Arrow Flight and JSON Lines)
- Serves data only
- Does not execute extraction jobs
- Does not provide management APIs
Query Interfaces
Section titled “Query Interfaces”-
Arrow Flight (port 1602)
- High-performance binary interface
- gRPC-based
- Uses Apache Arrow / Flight SQL
-
JSON Lines (port 1603)
- HTTP
POSTinterface - Returns newline-delimited JSON (NDJSON)
- Supports streaming and compression (gzip, brotli, deflate)
- HTTP
Basic Usage
Section titled “Basic Usage”# Start both query servers (default)ampd server
# Start only the Arrow Flight serverampd server --flight-server
# Start only the JSON Lines serverampd server --jsonl-server
# Start both explicitlyampd server --flight-server --jsonl-serverJSON Lines Query Example
Section titled “JSON Lines Query Example”curl -X POST http://localhost:1603 \ --data "SELECT * FROM 'ns/eth_mainnet'.blocks LIMIT 10"Arrow Flight Example (Python)
Section titled “Arrow Flight Example (Python)”Copy codefrom pyarrow import flight
client = flight.connect("grpc://localhost:1602")reader = client.do_get( flight.Ticket("SELECT * FROM 'ns/eth_mainnet'.blocks LIMIT 10"))table = reader.read_all()print(table.to_pandas())Without flags, both query servers are enabled by default. When you specify any server flags, only those interfaces are enabled.
3.2 Worker Component
Section titled “3.2 Worker Component”Purpose
Section titled “Purpose”The worker executes scheduled extraction jobs in distributed deployments.
- Runs as a standalone process
- Coordinates with other components via the PostgreSQL metadata database
Responsibilities**
Section titled “Responsibilities**”Workers:
- Register with the metadata DB using a node ID
- Send heartbeat signals (health)
- Listen for job notifications (LISTEN/NOTIFY)
- Execute dump jobs (pull data, write Parquet files)
- Update job status and file metadata
- Resume jobs after restarts
- Periodically reconcile job state with the metadata DB
Basic Usage
Section titled “Basic Usage”# Single worker
ampd worker --node-id worker-01
# Multiple workers for parallel extraction
ampd worker --node-id worker-01 &d worker --node-id worker-02 &d worker --node-id worker-03 &Workers can use descriptive IDs, for example:
ampd worker --node-id eu-west-1a-workerampd worker --node-id us-east-1b-worker3.3 Controller Component
Section titled “3.3 Controller Component”Purpose
Section titled “Purpose”The controller provides the Admin API for managing Amp:
- Dataset management
- Job creation and control
- Worker and file metadata visibility
It runs as a standalone service and is typically deployed separately from the query server.
Admin API
Section titled “Admin API”- Default port: 1610
- REST-style management interface
- Operations include:
- Dataset registration and versioning
- Job deployment and status inspection
- Job stop/delete operations
- Worker location and file listing
Basic Usage
Section titled “Basic Usage”ampd controllerExample: Dataset Management
Section titled “Example: Dataset Management”# List all datasets
curl http://localhost:1610/datasets
# Get dataset details (specific version)
curl http://localhost:1610/datasets/my_namespace/eth_mainnet/versions/1.0.0
# List all versions of a dataset
curl http://localhost:1610/datasets/my_namespace/eth_mainnet/versions
# Register a new dataset
curl -X POST http://localhost:1610/datasets \ -H "Content-Type: application/json" \ -d @dataset_definition.jsonExample: Job Control
Section titled “Example: Job Control”# List all jobs
curl http://localhost:1610/jobs
# Deploy a dataset (start a dump job)
curl -X POST http://localhost:1610/datasets/my_namespace/eth_mainnet/versions/1.0.0/deploy \ -H "Content-Type: application/json" \ -d '{"end_block": 20000000}'
# Get job status (replace 42 with actual job_id)
curl http://localhost:1610/jobs/42
# Stop a running job
curl -X PUT http://localhost:1610/jobs/42/stop
# Delete a job
curl -X DELETE http://localhost:1610/jobs/42In development mode (
ampd dev), the controller (Admin API) is embedded in the same process as the server.
4. Development Mode (Single-Node)
Section titled “4. Development Mode (Single-Node)”Purpose
Section titled “Purpose”Development mode runs all components in a single process and is optimized for:
- Local development
- Quick prototyping
- CI and automated testing
- Learning how Amp behaves end-to-end
Not recommended for production deployments.
How It Works
Section titled “How It Works”When you run ampd dev:
- Server starts (Arrow Flight + JSON Lines)
- Controller (Admin API) starts in the same process
- Embedded worker starts with node ID worker
- Worker registers with the metadata DB and listens for jobs
- Jobs scheduled via Admin API run inside this single process
- Logging and errors are centralized for easier debugging
Basic Usage
Section titled “Basic Usage”# Start development modeampd devSchedule a job:
curl -X POST http://localhost:1610/datasets/my_namespace/eth_mainnet/versions/dev/deploy \ -H "Content-Type: application/json" \ -d '{"end_block": 1000000}'Query the data:
Section titled “Query the data:”curl -X POST http://localhost:1603 \ --data "SELECT COUNT(\*) FROM 'my_namespace/eth_mainnet'.blocks"Benefits
Section titled “Benefits”- Single process, minimal setup
- No separate worker or controller configuration
- Fast iteration loops
Limitations
Section titled “Limitations”- No fault isolation (worker crash kills the process)
- Resource contention between queries and extraction
- No horizontal scaling or high availability
- Not production-grade
5. Deployment Patterns
Section titled “5. Deployment Patterns”This section outlines common deployment topologies and when to use them.
5.1 Pattern: Development Mode (Single-Node)
Section titled “5.1 Pattern: Development Mode (Single-Node)”Use when:
- Local development and testing
- CI environments
- Prototyping
Command:
ampd dev5.2 Pattern: Query-Only Server (Distributed, Read-Only)
Section titled “5.2 Pattern: Query-Only Server (Distributed, Read-Only)”Use when:
- You only need read-only query serving
- Datasets are populated by external extraction processes
- You want multiple query replicas for load balancing
Commands:
ampd server5.3 Pattern: Server + Controller + Workers (Full Distributed)
Section titled “5.3 Pattern: Server + Controller + Workers (Full Distributed)”Use when:
- Production deployments
- You need resource isolation between queries and extraction
- You need horizontal scaling and high availability
Commands:
# Server node(s)
ampd server
# Controller node
ampd controller
# Worker node(s)
ampd worker --node-id worker-01ampd worker --node-id worker-02ampd worker --node-id worker-035.4 Pattern: Multi-Region Distributed
Section titled “5.4 Pattern: Multi-Region Distributed”Use when:
- You need global low-latency query access
- You want geographic redundancy
- You’re operating large-scale production systems
Example Commands:
# Region A
ampd serverampd controllerampd worker --node-id us-east-1-worker
# Region B
ampd serverampd worker --node-id eu-west-1-worker6. Choosing Between Modes
Section titled “6. Choosing Between Modes”Use Single-Node Mode (ampd dev) When:
Section titled “Use Single-Node Mode (ampd dev) When:”- You’re developing locally
- You’re running tests or quick prototypes
- You want a simple, all-in-one workflow
Not appropriate for production workloads.
Use Distributed Mode When:
Section titled “Use Distributed Mode When:”Query-only server:
Section titled “Query-only server:”- You need read-only query serving
- Datasets are populated externally
- You need multiple query replicas
Controller-only:
Section titled “Controller-only:”- You need a management interface in a private network
- Query serving is handled elsewhere
- You focus on job scheduling and monitoring
Controller + Server + Workers:
Section titled “Controller + Server + Workers:”- You’re running in production
- You need resource isolation and scaling
- You require high availability and continuous ingestion
- You’re operating in single or multiple regions
7. Scaling Path
Section titled “7. Scaling Path”Recommended progression as your deployment grows:
Stage 1: Development & Testing
Section titled “Stage 1: Development & Testing”- Mode: Single-node
- Command:
ampd dev - Single machine, minimal setup
- Not for production use
Stage 2: Production Single-Region
Section titled “Stage 2: Production Single-Region”- Mode: Distributed
- Deploy
ampd controlleron a management node - Deploy
ampd serveron one or more query nodes - eploy
ampd worker --node-id <id>on extraction nodes - Enable observability (e.g., OpenTelemetry)
- Configure compaction and retention
Stage 3: Scaled Distributed Extraction
Section titled “Stage 3: Scaled Distributed Extraction”- Mode: Distributed (scaled)
- Multiple servers for query load balancing
- Multiple workers for parallel extraction
- Shared PostgreSQL and object store
Stage 4: Multi-Region Production
Section titled “Stage 4: Multi-Region Production”- Mode: Distributed (global)
- Controller in a primary region
- Servers in multiple regions for low-latency queries
- Workers close to data sources
- Shared global metadata DB and object store
8. Security Considerations
Section titled “8. Security Considerations”Separation of controller, server, and worker components enables fine-grained security via network isolation and access controls.
Component Security Profiles
Section titled “Component Security Profiles”Controller (Admin API — Port 1610) Security level: Most sensitive
Capabilities:
- Schedule, start, stop, delete jobs
- Register and modify datasets
- Monitor workers
- Access file metadata
Requirements:
- Must run in a private network
- Must not be exposed to the public internet
- Restrict access to authorized operators only
- Recommended behind VPN or bastion host
- Strict firewall rules and IP allowlists
Server (Query Interfaces — Ports 1602, 1603) Security level: Medium, potentially public-facing
Capabilities:
- Query-only access (read)
- No dataset or job management
Requirements:
- Can be exposed publicly if needed
- Implement rate limiting and timeouts
- Monitor for abusive or expensive queries
- Optionally put behind API gateway for auth
- Prefer read-only DB access where possible
\Worker (No Exposed Ports) Security level: Internal component
Capabilities:
- Execute extraction jobs
- Write to object store
- Update metadata DB
Requirements:
- Runs in trusted/private network
- Needs DB write access and object store write access
- No inbound network access required
Authentication & Authorization
Section titled “Authentication & Authorization”Current state: Amp components do not ship with built-in auth.
Security relies on:
- Network isolation (VPCs, subnets, firewalls)
- Database authentication (PostgreSQL credentials)
- Object store authentication (e.g., IAM roles, service accounts)
Recommended external layers:
-
API Gateway / Reverse Proxy
- e.g., Nginx, Traefik, cloud API gateways
- API keys, JWT, OAuth2/OIDC
-
Mutual TLS (mTLS)
- Client certificate validation
- Particularly relevant for Arrow Flight (gRPC)
-
VPN / Zero Trust
- WireGuard, Tailscale, or cloud VPNs
- Mandatory for controller access
-
Network Policies
- Kubernetes NetworkPolicies
- Cloud security groups / firewall rules
Secrets Management
Section titled “Secrets Management”Required secrets typically include:
- PostgreSQL connection strings
- Object store credentials
- Blockchain RPC keys
- Firehose tokens (if applicable)
Recommendations:
- Never commit secrets to version control
- Use secret managers:
- Kubernetes Secrets
- AWS Secrets Manager
- GCP Secret Manager
- HashiCorp Vault
- Azure Key Vault
- Inject secrets via environment variables or secret volumes
- Rotate credentials regularly
- Prefer IAM roles / service accounts over static keys
9. Threat Model Summary
Section titled “9. Threat Model Summary”| Component | Threat Level | Attack Surface | Mitigation |
|---|---|---|---|
| Controller | High | Admin API (1610) | Private network, VPN, audit logging |
| Server | Medium | Query APIs (1602/1603) | Rate limiting, read-only DB, DDoS protections |
| Worker | Low | None (no inbound ports) | Private network, minimal outbound access |
| Dev Mode | Critical | All services combined | Do not use in production |