4bit topic

List 4bit repositories

marlin

607
Stars
46
Forks
Watchers

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

unsloth-llama3-alpaca-lora

31
Stars
0
Forks
31
Watchers

Advanced 4-bit QLoRA fine-tuning pipeline for LLaMA 3 8B with production-grade optimization. Memory-efficient training on consumer GPUs for instruction-following specialization. Demonstrates cutting-e...