r4dm comments

Results 7 comments of


                                            r4dm

How can I use the Lora Adapter for a model with Vocab size 40960?

same problem, please add support for vocab_size over 32000. in my case need vocab_size 60256

undefined symbol: cquantize_blockwise_fp16_fp4

`cp libbitsandbytes_cuda117.so libbitsandbytes_cpu.so` This method worked for me in WSL Ubuntu on Windows11. I installed anacoda3 and cuda118 special version for `WSL2` FineTuning Vicuna-13B with QloRA works well on RTX4090

Quantize a 70B model on a 80GB VRAM VM

https://github.com/casper-hansen/AutoAWQ/issues/558

Quantitative model report wrong, RuntimeError: Expected all tensors to be on the same device

the same problem. version 0.2.6 installs transformers 4.43.3 which gives an error during quantization. In this case, it is the quantization code that does not work, but the inference code...

Quantize a 70B model on a 80GB VRAM VM

the solution works only with llama 3, not 3.1

Quantitative model report wrong, RuntimeError: Expected all tensors to be on the same device

This also did not help in my case. I quantize the 70b model on one A100 and with default settings this used to happen normally. With new version autoawq and...

Quantitative model report wrong, RuntimeError: Expected all tensors to be on the same device

temporary solution works only with llama 3, not 3.1 Because support for 3.1 was added in transformers v4.43.0