How do I size my Prometheus deployment?
Your monitoring just went down because Prometheus got OOM-killed again. Or maybe you’re paying for 32GB of RAM when 8GB would suffice. Sizing Prometheus shouldn’t be guesswork - it’s actually predictable math based on three inputs: active time series, scrape interval, and retention period. Our Resource Calculator does the math for you, showing memory, CPU, and disk requirements with visual guidance on safe ranges and real-world scaling examples.
The Three Key Inputs
- Active Time Series: The number of unique time series your Prometheus instance tracks
- Scrape Interval: How often Prometheus collects metrics (default: 60 seconds)
- Retention Period: How long to store historical data (default: 30 days)
Finding Your Active Time Series Count
If you already have Prometheus running, query:
1prometheus_tsdb_head_series
This metric shows the current number of time series in the Prometheus TSDB (Time Series Database) head block. If you’re planning a new deployment, estimate based on:
- Number of targets (servers, containers, etc.)
- Metrics per target (typically 500-2000 per host, 50-200 per container)
- Expected growth over the retention period
Resource Calculations
Memory Requirements
Prometheus memory usage scales linearly with active time series:
$$\text{Memory (GB)} = \frac{\text{time\_series} \times 7.5 \text{ KiB}}{1024^2}$$- Recommended: 7.5 KiB per time series
- Safe Range: 7-9 KiB per time series
The actual memory usage depends on:
- Series churn rate: How frequently time series appear and disappear
- Label cardinality: Number of unique label combinations
- Sample rate: Higher scrape frequencies increase memory pressure
CPU Requirements
CPU usage is harder to predict as it depends on query complexity, but a general rule:
$$\text{CPU Cores} = \max\left(2, \left\lfloor\frac{\text{Memory (GB)}}{4}\right\rfloor\right)$$Allocate 1 core per 4GB of memory, with a minimum of 2 cores. CPU load increases with:
- Recording rules: Pre-computing aggregations
- Alert rule evaluations: Complex PromQL queries
- Query load: Dashboard refreshes, API queries, ad-hoc exploration
Disk Space Requirements
Disk usage depends on retention period and sample density:
$$\text{Disk (GB)} = \frac{\text{time\_series} \times \text{samples\_per\_series} \times 1.5 \text{ bytes}}{1024^3} \times 1.2$$Where:
$$\text{samples\_per\_series} = \frac{\text{retention\_days} \times 86400}{\text{scrape\_interval}}$$- 1.5 bytes per sample: Prometheus’s efficient compression based on the Gorilla compression algorithm from Facebook
- 1.2x multiplier: 20% overhead for WAL (Write-Ahead Log) and temporary data
Interpreting the Chart
The Resource Calculator shows:
- Green shaded area: Safe memory range (7-9 KiB per series)
- Solid green line: Recommended allocation (7.5 KiB)
- Blue dots: Example configurations at common scales
- Red dot: Your specific configuration
- Logarithmic scale: Better visualization across orders of magnitude (1K to 10M+ series)
Important Caveats
These estimates are starting points, not guarantees. Actual resource usage varies based on:
- Recording rules: Each recording rule creates new time series, increasing memory
- Alert rules: Complex alert evaluations increase CPU usage
- Query patterns: Heavy dashboard loads or complex queries require more CPU
- Remote write: Sending data to remote storage adds CPU and network overhead
- Cardinality explosions: Poorly designed metrics can create millions of series unexpectedly
Monitoring Your Actual Usage
After deployment, monitor these Prometheus metrics:
1# Memory usage
2process_resident_memory_bytes
3
4# Active time series
5prometheus_tsdb_head_series
6
7# Disk usage
8prometheus_tsdb_storage_blocks_bytes
9
10# Ingestion rate (samples per second)
11rate(prometheus_tsdb_head_samples_appended_total[5m])