VRAM and GPU memory — not token counts
This calculator estimates how much video memory (VRAM) a loaded model may require on a GPU when weights are stored at a given precision (for example FP16 or 8-bit quantized formats). That is a hardware and runtime storage question: how many billions of parameters fit in fast on-device memory, plus activations and framework overhead.
Token counting(context length, prompt size, and API billing) is a different axis. Tokens describe how much text you send through a tokenizer for inference or for cloud APIs — they do not directly tell you whether a 70B model will fit on a 24 GB consumer GPU. Use our Token Calculator for context limits and rough cost estimates, and this RAM/VRAM tool when you are sizing a local or self-hosted GPU build.
Token speed (throughput and latency) is yet another dimension: the Token Speed Simulator helps reason about tokens per second and wait times, independent of whether weights fit in VRAM.