add gptq and awq int4 support in intel platform
What does this PR do?
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@Narsil @danieldk please help review.
@ErikKaum could you help review the PR?
Hi @sywangyi 👋
Yes, let me run the tests in a separate branch so that we don't get the permission errors 👍 I should have time to do it today or tomorrow 👍
@ErikKaum @Narsil upload fix for ci, please rerun the ci
@sywangyi there seems still to be an error in the dockerfile:
Dockerfile_intel:154
--------------------
152 | RUN git clone https://github.com/intel/intel-extension-for-pytorch && cd intel-extension-for-pytorch && git checkout f86e93e4890dc2c989024d148d415c9aa8a1649f
153 | RUN git clone https://github.com/intel/torch-ccl.git && cd torch-ccl && git checkout v2.4.0+cpu+rc0
154 | >>> RUN cd intel-extension-for-pytorch && git submodule sync && git submodule update --init --recursive && python setup.py install
155 | RUN cd torch-ccl && git submodule sync && git submodule update --init --recursive && pip install .
156 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd intel-extension-for-pytorch && git submodule sync && git submodule update --init --recursive && python setup.py install" did not complete successfully: exit code: 1
@ErikKaum Could you help to retrigger the CI build/for intel-cpu, since we did not see the build error in previous ci and I have not made any change to the Dockerfile_intel in the new commits
will rework it after https://github.com/huggingface/text-generation-inference/pull/2517 is merged. since python is upgraded from 3.10 to 2.11
@ErikKaum rebase done, please retrigger the CI, review and merge it.
seems the failure is not related with the PR ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_simple - RuntimeError: Launcher crashed ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_all_params - RuntimeError: Launcher crashed ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_load - RuntimeError: Launcher crashed
@ErikKaum could you help retrigger it?
this PR is also needed to make mllama output correct in ipex-cpu. since it will upgrade ipex. could anyone help merge it?
@ErikKaum @Narsil ,please help. @yao-matrix
IT's merged from an updated PR I prepared for CI (https://github.com/huggingface/text-generation-inference/pull/2665) (only minor fixes have been updated in the control flow and adding a few comments).