regisss comments

Results 73 comments of


                                            regisss

LlamaForCausalLM.forward() got an unexpected keyword argument 'use_flash_attention'

Thanks. What is the command you use to run this script?

LlamaForCausalLM.forward() got an unexpected keyword argument 'use_flash_attention'

I cannot reproduce it, it works on my side. Can you provide the full logs of your run and the output of `pip list` please?

LlamaForCausalLM.forward() got an unexpected keyword argument 'use_flash_attention'

Are you using Gaudi1 or Gaudi2?

LlamaForCausalLM.forward() got an unexpected keyword argument 'use_flash_attention'

Can you share the logs of your run please? For a 634M-parameter model, you should be able to fit much bigger batches on Gaudi2.

Cannot import name 'deepspeed_reinit' from 'transformers.deepspeed'

Hi @rubenCrayon! `deepspeed_reinit` was removed a few versions ago, you should use a more recent version of Optimum. Which may requires to change your script a bit, in that case...

Save the tokenizer and image preprocessor after training a model with the contrastive image-text example

@sgugger All tests passed, so I think this one can be merged :slightly_smiling_face:

Add UDOP

@plamb-viso Here is the guide to add ONNX export support for a new architecture in Optimum: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute Feel free to open a PR there and we'll help you if you...

Tracing mismatch during conversion of Whisper model to ONNX using torch.onnx.export

Hi @hannan72! I recommend that you use Optimum for exporting Whisper to the ONNX format (it will basically be a wrapper around `torch.onnx.export` but it is tested and Whisper is...

Tracing mismatch during conversion of Whisper model to ONNX using torch.onnx.export

Yes I see you opened this issue in Optimum: https://github.com/huggingface/optimum/issues/827 I think the best is to wait for @fxmarty to take a look at it. Regarding these warnings, I don't...

Replace `-m torch.distributed.run` by `torchrun`

`torchrun` is equivalent to `python -m torch.distributed.run` while `python -m torch.distributed.launch` is deprecated. I think the reason why it is deprecated is just that `torchrun` does the same but also...