Mark Schmidt

Results 95 comments of Mark Schmidt

> > It would be great if FAIR could provide some guidance on vram requirements > > See: [memory requirements for each model size](https://github.com/cedrickchee/llama/blob/main/chattyllama/hardware.md#memory-requirements-for-each-model-size). > > It's a small place...

4-bit for LLaMA is underway https://github.com/oobabooga/text-generation-webui/issues/177 65B in int4 fits on a single v100 40GB, even further reducing the cost to access this powerful model. Int4 LLaMA VRAM usage is...

That particular 4-bit implementation is mostly a proof of concept at this point. Bitsandbytes may be getting some 4-bit functionality towards the end of this month. Best to wait for...

https://mobile.twitter.com/Tim_Dettmers/status/1605209177919750147 "Our analysis is extensive, spanning 5 models (BLOOM, BLOOM, Pythia, GPT-2, OPT), from 3 to 8-bit precision, and from 19M to 66B scale. We find the same result again...

![image](https://user-images.githubusercontent.com/5949853/223291180-07287b36-9321-45fc-96b6-ad75a33e8726.png) Something seems off. LLaMA-30B is ~60GB in fp16. I would expect it to be around 1/4 of that size in 4bit, ie. 15GB. 12GB is considerably smaller and about...

+1 Fun fact: Unlike LLaMA, FLAN is actually open source (Apache License).

> For anybody having troubles still, you can try using newer library - https://github.com/james-things/bitsandbytes-prebuilt-all_arch Using v37 did it for me finally :) https://github.com/oobabooga/text-generation-webui/issues/20#issuecomment-1455762694 This may not be the issue, but...

> 122GB. > > What would be interesting is to benchmark quality versus memory size, i.e. does say a fp16 13B model generate better output than a int4 60GB model?...

>Which led me to wonder where the sweet spots are among two parameters for a given memory footprint? 13B appears to have negligible quality difference at 3-bit. So you'll want...

Unironically LGTM. People are already doing the training. It's just a waste of energy which could be instantly spared by simply accepting this PR. Do the right thing.