fxmarty

Results 324 comments of fxmarty

Yes we should probably force the next optimum version.

Should be ready @sgugger , the documentation has been extended in https://moon-ci-docs.huggingface.co/docs/transformers/pr_21259/en/perf_infer_gpu_one . Let me know if I should add a test - in which case optimum should be added...

Thanks, will do! > especially to test accelerate compatibility Isn't this already tested on Optimum side?

There are tests on the daily basis on GPU in Optimum, for example https://github.com/huggingface/optimum/blob/main/.github/workflows/test_onnxruntime_train.yml and https://github.com/huggingface/optimum/blob/main/.github/workflows/test_onnxruntime_gpu.yml In my opinion, thorough tests should be added in Optimum, not Transformers. The test...

> you should finish the work and have it merged sooner rather than later :-) There is substantial work left in Optimum before this should be merged. Marking as draft...

> Groupsize has a negligible impact on performance, and the extra file size doesn't prevent 33B models from using full context on 24 GB. Act-order has a small impact on...

Thank you! Yes, `s_row(B)` is fine as done ahead of time (weights), but in the row tensor parallelism (that typically follow a column tensor parallel operation) case the activation `A`...

@turboderp You can usually avoid a gather of the activation inbetween a column tensor parallel linear and row tensor parallel linear, see the shapes on the figure here: https://huggingface.co/docs/transformers/v4.30.0/en/perf_train_gpu_many#tensor-parallelism

I can reproduce the issue. Feel free to open a PR if you find a fix.