fxmarty comments

Results 329 comments of


                                            fxmarty

Community contribution - `BetterTransformer` integration for more models!

Hi, @mszsorondo Looking into the PRs, BLIP has been implemented in https://github.com/huggingface/optimum/pull/1125. I just ticked it in the first post. @rajveer43 For Flava, there is this onging PR: https://github.com/huggingface/optimum/pull/907

[FEATURE] Fast AWQ checkpoints repacking

Oh that's cool, thank you for sharing!

[FEATURE] Fast AWQ checkpoints repacking

Marlin repacking kernel is integrated in https://github.com/AutoGPTQ/AutoGPTQ/pull/539, thank you @chu-tianxiang for the implementation!

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0

@Qubitium I think `test_mixtral_generation` is flacky. `test_q4.py` is very slow for two reasons: it is using large models (7B, 13B), and more importantly some tests are on CPU only and...

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0

Thank you, is there a way (i.e. non-private model) for me to reproduce & add a better test for this? What model architecture are you using? If llama, it could...

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0

Yes Yi is llama architecture. So likely some breaking things in transformers :/

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0

@Qubitium How long does the quantization take to reproduce on your private model? If you have time at hand, something you could try is to look at https://github.com/huggingface/transformers/commits/main/ between 1st...

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0

@gante do you see any obvious issue in the linked commit? Degraded generation is reported with Transformers 4.39 using `generate` with a model whose linear layers are replaced, compared to...

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0

@Qubitium Do you see the difference before/after 4.39 also with a simple forward call, or only with `generate` calls? In your comparison, if you compare generations, could you make sure...

Assistance exporting git-large to ONNX

> In a worst-case scenario, can I use from_pretrained in my application? Yes, it is fine to use just this. Unless you need to speed up inference, make things portable,...