Skip to content

Operational Modes & Deployment Patterns

This page is a technical reference for Amp’s operational modes, components, and deployment patterns. It’s intended for engineers designing, deploying, or operating Amp in development and production environments.

Amp provides several core commands that can be composed into different deployment strategies:

  • server — Query server exposing Arrow Flight and JSON Lines interfaces
  • worker — Executes scheduled extraction jobs
  • controller — Hosts the Admin API for job and dataset management
  • migrate — Applies metadata database migrations

Amp supports two primary operational modes:

  • Single-Node Mode — All components run together (local dev/testing)
  • Distributed Mode — Components deployed independently (production)

Single-node mode runs all components in one process.

  • Server, controller, and worker run together
  • Activated via ampd dev
  • Optimized for:
    • Local development
    • CI pipelines
    • Quick testing and prototyping
  • Not designed for production reliability or fault isolation

Distributed mode separates components into independent processes.

  • Server, controller, and workers run separately
  • Suitable for production deployments
  • Enables:
    • Horizontal scaling
    • Resource isolation (queries vs extraction)
    • High availability and failover
  • Exposes query interfaces (Arrow Flight and JSON Lines)
  • Serves data only
  • Does not execute extraction jobs
  • Does not provide management APIs
  • Arrow Flight (port 1602)

    • High-performance binary interface
    • gRPC-based
    • Uses Apache Arrow / Flight SQL
  • JSON Lines (port 1603)

    • HTTP POST interface
    • Returns newline-delimited JSON (NDJSON)
    • Supports streaming and compression (gzip, brotli, deflate)
Terminal window
# Start both query servers (default)
ampd server
# Start only the Arrow Flight server
ampd server --flight-server
# Start only the JSON Lines server
ampd server --jsonl-server
# Start both explicitly
ampd server --flight-server --jsonl-server
Terminal window
curl -X POST http://localhost:1603 \
--data "SELECT * FROM 'ns/eth_mainnet'.blocks LIMIT 10"
Copy code
from pyarrow import flight
client = flight.connect("grpc://localhost:1602")
reader = client.do_get(
flight.Ticket("SELECT * FROM 'ns/eth_mainnet'.blocks LIMIT 10")
)
table = reader.read_all()
print(table.to_pandas())

Without flags, both query servers are enabled by default. When you specify any server flags, only those interfaces are enabled.

The worker executes scheduled extraction jobs in distributed deployments.

  • Runs as a standalone process
  • Coordinates with other components via the PostgreSQL metadata database

Workers:

  • Register with the metadata DB using a node ID
  • Send heartbeat signals (health)
  • Listen for job notifications (LISTEN/NOTIFY)
  • Execute dump jobs (pull data, write Parquet files)
  • Update job status and file metadata
  • Resume jobs after restarts
  • Periodically reconcile job state with the metadata DB
Terminal window
# Single worker
ampd worker --node-id worker-01
# Multiple workers for parallel extraction
ampd worker --node-id worker-01 &
ampd worker --node-id worker-02 &
ampd worker --node-id worker-03 &

Workers can use descriptive IDs, for example:

Terminal window
ampd worker --node-id eu-west-1a-worker
ampd worker --node-id us-east-1b-worker

The controller provides the Admin API for managing Amp:

  • Dataset management
  • Job creation and control
  • Worker and file metadata visibility

It runs as a standalone service and is typically deployed separately from the query server.

  • Default port: 1610
  • REST-style management interface
  • Operations include:
    • Dataset registration and versioning
    • Job deployment and status inspection
    • Job stop/delete operations
    • Worker location and file listing
Terminal window
ampd controller
Terminal window
# List all datasets
curl http://localhost:1610/datasets
# Get dataset details (specific version)
curl http://localhost:1610/datasets/my_namespace/eth_mainnet/versions/1.0.0
# List all versions of a dataset
curl http://localhost:1610/datasets/my_namespace/eth_mainnet/versions
# Register a new dataset
curl -X POST http://localhost:1610/datasets \
-H "Content-Type: application/json" \
-d @dataset_definition.json
Terminal window
# List all jobs
curl http://localhost:1610/jobs
# Deploy a dataset (start a dump job)
curl -X POST http://localhost:1610/datasets/my_namespace/eth_mainnet/versions/1.0.0/deploy \
-H "Content-Type: application/json" \
-d '{
"end_block": 20000000
}'
# Get job status (replace 42 with actual job_id)
curl http://localhost:1610/jobs/42
# Stop a running job
curl -X PUT http://localhost:1610/jobs/42/stop
# Delete a job
curl -X DELETE http://localhost:1610/jobs/42

In development mode (ampd dev), the controller (Admin API) is embedded in the same process as the server.

Development mode runs all components in a single process and is optimized for:

  • Local development
  • Quick prototyping
  • CI and automated testing
  • Learning how Amp behaves end-to-end

Not recommended for production deployments.

When you run ampd dev:

  1. Server starts (Arrow Flight + JSON Lines)
  2. Controller (Admin API) starts in the same process
  3. Embedded worker starts with node ID worker
  4. Worker registers with the metadata DB and listens for jobs
  5. Jobs scheduled via Admin API run inside this single process
  6. Logging and errors are centralized for easier debugging
Terminal window
# Start development mode
ampd dev

Schedule a job:

Terminal window
curl -X POST http://localhost:1610/datasets/my_namespace/eth_mainnet/versions/dev/deploy \
-H "Content-Type: application/json" \
-d '{
"end_block": 1000000
}'
Terminal window
curl -X POST http://localhost:1603 \
--data "SELECT COUNT(\*) FROM 'my_namespace/eth_mainnet'.blocks"
  • Single process, minimal setup
  • No separate worker or controller configuration
  • Fast iteration loops
  • No fault isolation (worker crash kills the process)
  • Resource contention between queries and extraction
  • No horizontal scaling or high availability
  • Not production-grade

This section outlines common deployment topologies and when to use them.

5.1 Pattern: Development Mode (Single-Node)

Section titled “5.1 Pattern: Development Mode (Single-Node)”

Use when:

  • Local development and testing
  • CI environments
  • Prototyping

Command:

Terminal window
ampd dev

5.2 Pattern: Query-Only Server (Distributed, Read-Only)

Section titled “5.2 Pattern: Query-Only Server (Distributed, Read-Only)”

Use when:

  • You only need read-only query serving
  • Datasets are populated by external extraction processes
  • You want multiple query replicas for load balancing

Commands:

Terminal window
ampd server

5.3 Pattern: Server + Controller + Workers (Full Distributed)

Section titled “5.3 Pattern: Server + Controller + Workers (Full Distributed)”

Use when:

  • Production deployments
  • You need resource isolation between queries and extraction
  • You need horizontal scaling and high availability

Commands:

Terminal window
# Server node(s)
ampd server
# Controller node
ampd controller
# Worker node(s)
ampd worker --node-id worker-01
ampd worker --node-id worker-02
ampd worker --node-id worker-03

Use when:

  • You need global low-latency query access
  • You want geographic redundancy
  • You’re operating large-scale production systems

Example Commands:

Terminal window
# Region A
ampd server
ampd controller
ampd worker --node-id us-east-1-worker
# Region B
ampd server
ampd worker --node-id eu-west-1-worker
  • You’re developing locally
  • You’re running tests or quick prototypes
  • You want a simple, all-in-one workflow

Not appropriate for production workloads.

  • You need read-only query serving
  • Datasets are populated externally
  • You need multiple query replicas
  • You need a management interface in a private network
  • Query serving is handled elsewhere
  • You focus on job scheduling and monitoring
  • You’re running in production
  • You need resource isolation and scaling
  • You require high availability and continuous ingestion
  • You’re operating in single or multiple regions

Recommended progression as your deployment grows:

  • Mode: Single-node
  • Command: ampd dev
  • Single machine, minimal setup
  • Not for production use
  • Mode: Distributed
  • Deploy ampd controller on a management node
  • Deploy ampd server on one or more query nodes
  • eploy ampd worker --node-id <id> on extraction nodes
  • Enable observability (e.g., OpenTelemetry)
  • Configure compaction and retention
  • Mode: Distributed (scaled)
  • Multiple servers for query load balancing
  • Multiple workers for parallel extraction
  • Shared PostgreSQL and object store
  • Mode: Distributed (global)
  • Controller in a primary region
  • Servers in multiple regions for low-latency queries
  • Workers close to data sources
  • Shared global metadata DB and object store

Separation of controller, server, and worker components enables fine-grained security via network isolation and access controls.

Controller (Admin API — Port 1610) Security level: Most sensitive

Capabilities:

  • Schedule, start, stop, delete jobs
  • Register and modify datasets
  • Monitor workers
  • Access file metadata

Requirements:

  • Must run in a private network
  • Must not be exposed to the public internet
  • Restrict access to authorized operators only
  • Recommended behind VPN or bastion host
  • Strict firewall rules and IP allowlists

Server (Query Interfaces — Ports 1602, 1603) Security level: Medium, potentially public-facing

Capabilities:

  • Query-only access (read)
  • No dataset or job management

Requirements:

  • Can be exposed publicly if needed
  • Implement rate limiting and timeouts
  • Monitor for abusive or expensive queries
  • Optionally put behind API gateway for auth
  • Prefer read-only DB access where possible

\Worker (No Exposed Ports) Security level: Internal component

Capabilities:

  • Execute extraction jobs
  • Write to object store
  • Update metadata DB

Requirements:

  • Runs in trusted/private network
  • Needs DB write access and object store write access
  • No inbound network access required

Current state: Amp components do not ship with built-in auth.

Security relies on:

  1. Network isolation (VPCs, subnets, firewalls)
  2. Database authentication (PostgreSQL credentials)
  3. Object store authentication (e.g., IAM roles, service accounts)

Recommended external layers:

  • API Gateway / Reverse Proxy

    • e.g., Nginx, Traefik, cloud API gateways
    • API keys, JWT, OAuth2/OIDC
  • Mutual TLS (mTLS)

    • Client certificate validation
    • Particularly relevant for Arrow Flight (gRPC)
  • VPN / Zero Trust

    • WireGuard, Tailscale, or cloud VPNs
    • Mandatory for controller access
  • Network Policies

    • Kubernetes NetworkPolicies
    • Cloud security groups / firewall rules

Required secrets typically include:

  • PostgreSQL connection strings
  • Object store credentials
  • Blockchain RPC keys
  • Firehose tokens (if applicable)

Recommendations:

  • Never commit secrets to version control
  • Use secret managers:
    • Kubernetes Secrets
    • AWS Secrets Manager
    • GCP Secret Manager
    • HashiCorp Vault
    • Azure Key Vault
  • Inject secrets via environment variables or secret volumes
  • Rotate credentials regularly
  • Prefer IAM roles / service accounts over static keys
ComponentThreat LevelAttack SurfaceMitigation
ControllerHighAdmin API (1610)Private network, VPN, audit logging
ServerMediumQuery APIs (1602/1603)Rate limiting, read-only DB, DDoS protections
WorkerLowNone (no inbound ports)Private network, minimal outbound access
Dev ModeCriticalAll services combinedDo not use in production