Mark Schmidt
Mark Schmidt
> > It would be great if FAIR could provide some guidance on vram requirements > > See: [memory requirements for each model size](https://github.com/cedrickchee/llama/blob/main/chattyllama/hardware.md#memory-requirements-for-each-model-size). > > It's a small place...
4-bit for LLaMA is underway https://github.com/oobabooga/text-generation-webui/issues/177 65B in int4 fits on a single v100 40GB, even further reducing the cost to access this powerful model. Int4 LLaMA VRAM usage is...
That particular 4-bit implementation is mostly a proof of concept at this point. Bitsandbytes may be getting some 4-bit functionality towards the end of this month. Best to wait for...
https://mobile.twitter.com/Tim_Dettmers/status/1605209177919750147 "Our analysis is extensive, spanning 5 models (BLOOM, BLOOM, Pythia, GPT-2, OPT), from 3 to 8-bit precision, and from 19M to 66B scale. We find the same result again...
 Something seems off. LLaMA-30B is ~60GB in fp16. I would expect it to be around 1/4 of that size in 4bit, ie. 15GB. 12GB is considerably smaller and about...
+1 Fun fact: Unlike LLaMA, FLAN is actually open source (Apache License).
> For anybody having troubles still, you can try using newer library - https://github.com/james-things/bitsandbytes-prebuilt-all_arch Using v37 did it for me finally :) https://github.com/oobabooga/text-generation-webui/issues/20#issuecomment-1455762694 This may not be the issue, but...
> 122GB. > > What would be interesting is to benchmark quality versus memory size, i.e. does say a fp16 13B model generate better output than a int4 60GB model?...
>Which led me to wonder where the sweet spots are among two parameters for a given memory footprint? 13B appears to have negligible quality difference at 3-bit. So you'll want...
Unironically LGTM. People are already doing the training. It's just a waste of energy which could be instantly spared by simply accepting this PR. Do the right thing.