mxjmtxrm issues

Results 9 issues of


                                            mxjmtxrm

Question about quantized model with zero3

### System Info - `transformers` version: 4.41.0.dev0 - Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.35 - Python version: 3.10.12 - Huggingface_hub version: 0.21.4 - Safetensors version: 0.4.2 - Accelerate version: 0.28.0 - Accelerate config: not...

Quantization

AttributeError: 'HQQLinear' object has no attribute 'weight'

ValueError report

Hi, I met the following error when finetune llama7b model with FSDP+HQQ: ``` Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/workspace/fsdp_qlora/train.py", line 723,...

Question about GPU memory usage.

Hi, I tried to finetune a llama7b model with HQQ-LORA using dual GPUs. I found that during "Loading & Quantizing Model Shards", the peak GPU memory usage acheved 35G. What's...

How to cast 16/32-bit to FP8?

Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.

question

How to get a dequantized model?

Hi, I want to quant and dequant a model to just evaluate the accuracy/ppl of a model. But I found the model is packed after fastquantization. How to get a...

question about let

Hi, Why exclude down_proj when executing a let? like `model.mlp.down_proj.temp_weight = model.mlp.down_proj.weight`. I think the down_proj can be smoothed with up_proj, like `smooth_fc_fc_temporary(model.mlp.up_proj,model.mlp.down_proj, model.down_smooth_scale, model.down_smooth_shift)` Am I right?

Questions about quantization

Hi, great work! I met some problems during 4bit weight-only quantization(--lwc). 1. Is there any problem if the norm is nan? 2. what's the best lwc hyper-parameter of LLama2 with...

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

Hi, I tried to finetune Llama2-7b-chat model using megatron. I downloaded the hf checkpoint and convert it to GPT megatron checkpoint referring [https://github.com/NVIDIA/Megatron-LM/blob/fe1640a3cc4866e015bfdb6449f0d1943d2243cb/docs/llama_mistral.md?plain=1#L73](). The command I used is: ``` python...