Performance, Scaling, and Monitoring
Performance and Scaling Overview
Generator consists of two main services that work together:
- Backend service: Handles API requests and user interactions
- Worker service: Processes background tasks and content generation
Both services are designed to be stateless and horizontally scalable. This guide covers two main aspects of scaling:
- Container Provisioning: How to size individual containers (vertical scaling)
- Scaling Strategies: Different approaches to handling load, including both vertical and horizontal scaling
Both approaches can achieve similar performance, but horizontal scaling offers more flexibility for dynamic workloads.
Container Provisioning
This section covers how to size individual containers (vertical scaling) by allocating appropriate CPU and memory resources.
Backend Service Provisioning
The backend service automatically scales to use available CPU resources:
- Automatic scaling: One worker process per CPU core (no configuration required)
- Stateless design: Multiple backend containers can run simultaneously
- Load balancing: Use a load balancer to distribute requests across backend instances
Worker Service Provisioning
Worker containers process background tasks and can be scaled independently:
Environment Variables:
THREADS__DRAMATIQ_PROCESSES: Number of Dramatiq processes per containerTHREADS__DRAMATIQ_THREADS: Number of threads per process
Scaling Guidelines:
- Memory-based: Start with ~1 thread per GB of memory on worker nodes
- Process scaling: Each process contains
THREADS__DRAMATIQ_THREADSthreads - Resource monitoring: Increase threads if host metrics aren’t stressed
- PDF processing: Uses significant memory; adjust accordingly if not processing many PDFs
Example Configuration:
# For a 4GB worker node
THREADS__DRAMATIQ_THREADS=4
THREADS__DRAMATIQ_PROCESSES=1
# For a 16GB worker node with 2 processes
THREADS__DRAMATIQ_THREADS=8
THREADS__DRAMATIQ_PROCESSES=2
Scaling Strategies
This section covers different approaches to handling increased load, including both vertical scaling (bigger containers) and horizontal scaling (more containers).
Vertical Scaling (Recommended for Simplicity)
Advantages:
- Simpler deployment and management
- Fewer moving parts
- Easier to troubleshoot
Configuration:
- Increase container CPU/memory allocation
- Backend automatically uses additional CPUs
- Adjust
THREADS__DRAMATIQ_THREADSfor worker containers
Horizontal Scaling (Recommended for Flexibility)
Advantages:
- Better resource utilization
- Easier to scale in/out based on demand
- Better fault tolerance
Configuration:
- Deploy multiple backend containers behind load balancer
- Deploy multiple worker containers
- Use container orchestration (ECS, Kubernetes, Docker Swarm, etc)
Autoscaling
Generator’s stateless design makes it well-suited for autoscaling. Since both backend and worker containers can be scaled independently, you can adjust capacity based on demand. Your load balancer will automatically distribute work across the backend containers, and the queue system will distribute work across the workers. This set-up makes it easy to scale up during peak usage and scale down during quieter periods.
Monitoring Resource Usage: While Generator provides a metrics endpoint (see Monitoring section below), container orchestration platforms like ECS or Kubernetes typically provide more comprehensive resource monitoring and autoscaling capabilities. These platforms can monitor CPU, memory, and queue depth to make more informed scaling decisions than the basic metrics endpoint alone.
Monitoring
Metrics Endpoint
Generator exposes Prometheus-style metrics with a /metrics. e.g.
▶ curl http://[generator server]/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 1305.0
python_gc_objects_collected_total{generation="1"} 351.0
python_gc_objects_collected_total{generation="2"} 89.0
[...]
Available Metrics
- HTTP request volume and response times
- Python runtime metrics (memory usage, garbage collection)
- More in the future
To avoid leaking this information publicly, the metrics endpoint will reject any requests that were forwarded through a load balancer (determined by the presence of any X-Forwarded-* headers).
If you want to disable this endpoint, set the HOSTING__ENABLE_METRICS environment variable to false.