mobicham
mobicham
Thanks @davidberard98 , for now I am using `torch==2.6.0.dev20241101+cu121` which works fine. This is what I get with **gdb** (`2.6.0.dev20241112+cu121`): ``` (gdb) run Starting program: /opt/conda/bin/python test_torch.py warning: Error disabling...
Sorry for the delay @davidberard98 , I just tried with the nightly build and luckily it's working this time (`2.6.0.dev20241218+cu124`) Really appreciate your support, thanks!
@antiagainst indeed: ```Python dtype = torch.float8_e5m2 # v_mfma_f32_32x32x8_f16 a[0:15], v[4:5], v[18:19], a[0:15] dtype = torch.float8_e4m3fnuz # v_mfma_f32_32x32x16_fp8_fp8 a[0:15], v[12:13], v[0:1], a[0:15] dtype = torch.float8_e4m3fn #ERROR ``` So `float8_e4m3fnuz` seems to...
@antiagainst from the official AMD doc, it says that fp8 has a [-448, 448], while Torch/Triton is using `float8_e4m3fnuz` which has the [-240, 240] range - I am a bit...
1/3 https://github.com/huggingface/transformers/pull/33141/commits/5cb7d81547908dea660f525be5f77d9065b6edeb Removed the `check_old_param` hack. The problem however is that `HQQLinear.state_dict` is huge, which makes loading extremely slow. So I added `run_expected_keys_check` which skips those checks for `HQQLinear` params....
2/3: Multi-gpu loading Loading on multi-gpu looks like it's working fine. There's an issue with the BitBlas backend I just reported here Forcing the input to use the same device...
@SunMarc - Reverted back to `if isinstance(module, (torch.nn.Linear, HQQLinear)):` but we still need that `run_expected_keys_check` otherwise it breaks - Updated the default `HqqConfig` default params since `quant_scale`, `quant_zero`, and `offload_meta`...
Regarding this: https://github.com/huggingface/transformers/pull/33141#discussion_r1734388659 The issue is that to remove that additional check, we need to have all the HQQLinear dict keys for each layer in the list of expected keys....
There are TODOs to be done before merging: - Check if adding a bias on architectures that don't support the bias by default breaks the hqq model loading. - Trying...
> There are TODOs to be done before merging: > > * Check if adding a bias on architectures that don't support the bias by default breaks the hqq model...