transformers fix model names

This PR fixes the model name problems existing in Qwen2 related codes and docs

Mar 19 '24 09:03 JustinLin610

Thanks a lot! @JustinLin610

I will check with our CI runners and come back to you.

Mar 19 '24 09:03 ydshieh

Running on our runner (T4)

RUN_SLOW=1 TF_FORCE_GPU_ALLOW_GROWTH=yes python3 -m pytest -v tests/models/qwen2/

Repo id issue

FAILED tests/models/qwen2/test_tokenization_qwen2.py::Qwen2TokenizationTest::test_tokenizer_integration - OSError: Can't load tokenizer for 'Qwen/Qwen1.5-7B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Qwe...

Results not matching expected values

(we can simply update the expected values if you think it's the way to go - as long as the model still behaves correctly)

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2IntegrationTest::test_model_450m_logits - AssertionError: Tensor-likes are not close!

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2IntegrationTest::test_model_450m_long_prompt_sdpa - AssertionError: 'My favourite condiment is 100% ketchup. I love it on everything. I’m not a big' != 'My favourite condiment is  ____（醋）.\n根据提示"醋"可知，这里is单数，主语填'

GPU OOM: maybe use shorter output lengths?

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2IntegrationTest::test_model_450m_generation - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 594.00 MiB. GPU 0 has a total capacity of 14.76 GiB of which 224.75 MiB is free. Process 2822 has 14.53 GiB memory in use. Of the allocated me...

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2IntegrationTest::test_speculative_generation - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 14.76 GiB of which 
6.75 MiB is free. Process 2822 has 14.75 GiB memory in use. Of the allocated memor...

SDPA issue: this might be tricky

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_generate - AssertionError: False is not true

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_0_float16 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 4.090e+00, torch atol = 0.005, torch rtol = 0.005

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_2_float32 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 2.817e+00, torch atol = 1e-06, torch rtol = 0.0001

Mar 19 '24 12:03 ydshieh

For

check_code_quality

let's resolve it toward the end of the PR (before we merge to main)

Mar 19 '24 12:03 ydshieh

For

check_code_quality

let's resolve it toward the end of the PR (before we merge to main)

I think we only have code quality issue based on the test? The reported issues are from your internal CI tests right? I tested manually and fixed the mentioned problem. For sdpa, I did not run into issues btw.

https://huggingface.co/Qwen/Qwen1.5-7B this is the repo id. Anyway I'll switch to Qwen/Qwen1.5-0.5B for the consistency.

@ydshieh feel free to send me feedback 🚀

Mar 19 '24 14:03 JustinLin610

Hi @JustinLin610 Thank you for the updating again. OK, I will take care of them, but could you share on which GPU you ran the tests? (You ran with RUN_SLOW=1 right?)

Mar 22 '24 08:03 ydshieh

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Mar 22 '24 08:03 HuggingFaceDocBuilderDev

Hi @JustinLin610 Thank you for the updating again. OK, I will take care of them, but could you share on which GPU you ran the tests? (You ran with RUN_SLOW=1 right?)

I run with A100 80G, python 3.12, pytorch 2.2. You mean I run the eval with export RUN_SLOW=1 for the environment? No I didn't.

Mar 23 '24 01:03 JustinLin610

The integration tests (i.e. Qwen2IntegrationTest) would be run only with export RUN_SLOW=1. Sorry if I didn't make it clear. If you can run it again and see if there is/are something you can help us to fix, that would be much appreciated: this means to make sure the tests pass in you run (with export RUN_SLOW=1) .

(And if something is only failing on our T4, I could fix them on my side.)

Although it's just internal CI, it's an important part of the ecosystem that allow us to monitor if something is broken by newly merged PRs. I believe Qwen2 would benefit from haveing a working Qwen2IntegrationTest 🤗 .

Mar 26 '24 13:03 ydshieh

@JustinLin610 Could you run the following on your machine

RUN_SLOW=1 python -m pytest -v tests/models/qwen2/test_modeling_qwen2.py

and share the logs. Let's see how it goes, fix whatever could be done on your side, merge it and I will take care of the rest (if any) 🚀

Thank you in advance 🙏

Apr 04 '24 08:04 ydshieh