fxmarty comments

Results 324 comments of


                                            fxmarty

Add methods to PreTrainedModel to use PyTorch's BetterTransformer

Yes we should probably force the next optimum version.

Add methods to PreTrainedModel to use PyTorch's BetterTransformer

Should be ready @sgugger , the documentation has been extended in https://moon-ci-docs.huggingface.co/docs/transformers/pr_21259/en/perf_infer_gpu_one . Let me know if I should add a test - in which case optimum should be added...

Add methods to PreTrainedModel to use PyTorch's BetterTransformer

Thanks, will do! > especially to test accelerate compatibility Isn't this already tested on Optimum side?

Add methods to PreTrainedModel to use PyTorch's BetterTransformer

There are tests on the daily basis on GPU in Optimum, for example https://github.com/huggingface/optimum/blob/main/.github/workflows/test_onnxruntime_train.yml and https://github.com/huggingface/optimum/blob/main/.github/workflows/test_onnxruntime_gpu.yml In my opinion, thorough tests should be added in Optimum, not Transformers. The test...

Add methods to PreTrainedModel to use PyTorch's BetterTransformer

not stale

Add methods to PreTrainedModel to use PyTorch's BetterTransformer

> you should finish the work and have it merged sooner rather than later :-) There is substantial work left in Optimum before this should be merged. Marking as draft...

3-bit and 2-bit GPTQ support

> Groupsize has a negligible impact on performance, and the extra file size doesn't prevent 33B models from using full context on 24 GB. Act-order has a small impact on...

3-bit and 2-bit GPTQ support

Thank you! Yes, `s_row(B)` is fine as done ahead of time (weights), but in the row tensor parallelism (that typically follow a column tensor parallel operation) case the activation `A`...

3-bit and 2-bit GPTQ support

@turboderp You can usually avoid a gather of the activation inbetween a column tensor parallel linear and row tensor parallel linear, see the shapes on the figure here: https://huggingface.co/docs/transformers/v4.30.0/en/perf_train_gpu_many#tensor-parallelism

GPTQ LoRA Training is not working on me

I can reproduce the issue. Feel free to open a PR if you find a fix.