Profiling

10 instruments · Real-time model health · Latency · Memory management · Drift tracking

Live Memory Monitor

Fetching…

Loading…

All Instruments

Memory Status

Real-time system and GPU memory with tiered alerts

PROFILING

No configuration required — ready to run.

Latency Benchmark

Time-to-first-token and tokens-per-second measurement

PROFILING

Benchmark Prompt

Runs5

120

Memory Pre-Flight

Checks if a model will fit in available memory before loading

PROFILING

Model Path

Output Drift Detector

Detects semantic drift in model outputs over time

PROFILING

Probe Text

Window Size10

250

Throughput Monitor

Sustained tokens-per-second under continuous load

PROFILING

Test Duration (s)30

5120

Cache Pressure Analysis

KV cache eviction rates and memory pressure under load

PROFILING

Context Length4096

25632768

Thermal Profile

CPU/GPU temperature and throttling detection during inference

PROFILING

Sample Rate (Hz)1

0.110

Model Health Score

Composite health score from latency, drift, and memory metrics

PROFILING

No configuration required — ready to run.

Batch Size Optimizer

Finds max batch size that fits in memory without OOM

PROFILING

Sequence Length512

644096

Inference Profiler

Layer-by-layer time breakdown for a single forward pass

PROFILING

Prompt