Glossary · AI
What is
GGUF?
A quantized model file format used for efficient CPU and GPU inference, popularized by llama.cpp.
By Anish· Founder · Vedwix
·Definition
GGUF is the modern model file format for serving quantized LLMs locally or in resource-constrained environments. It supports many quantization levels (Q4, Q5, Q8, etc.) and is the format of choice for llama.cpp, Ollama, and many other local-inference tools.
Example
A Llama 3 8B Q4 GGUF file is around 4.7 GB and runs on a MacBook with full token streaming.
How Vedwix uses GGUF in client work
Default format for any locally-served fine-tuned model.
Building with GGUF?
We ship this.
If you're building with GGUF in production, we can help — from architecture review to full implementation.
Brief us