Casper

Results 295 comments of Casper

I quantized llama 3 70B on 3x A6000 48GB. Did you adjust the calibration dataset?

Ahh I see the issue. This is a transformers issue where they have a memory leak in their cache. if you see the examples/quantize.py, we use in the use_cache: False...

One last thing that I noticed about your code that can cause OOM. You use `device_map='auto'` which makes accelerate fill all GPUs with the modeling. It's better to set this...

By the way, this is a known issue. AWQ batches 128 samples through the forward pass of the model at the same time. A fix is being worked on where...

Hi @ryanshrott, this is not implemented yet. PRs are welcome to enable this. I recommend using git until then

I would love to add DBRX support. However, at the moment, I lack the hardware/$ to experiment enough to implement quantization support for this model because of the sheer size...

CC @younesbelkada. Not sure if this would break anything in the transformers integration. WDYT?

> As a sanity check I would run basic inference with transformers after emrging this PR just to be sure, but looking at the PR it does not seem to...

> the ppl improvement is really small did you try other scores to see if this is worth it ? This is how it should have been implemented from the...

This is a vision model and it would be nice to integrate with LLaVa and others. Open to PRs that helps integrate it!