CI for add gptq and awq int4 support in intel platform
Run CI for pr #2444
I check the 3 failure cases and find they are related with the bug fix in https://github.com/huggingface/text-generation-inference/pull/2444/files#diff-d8aff332cf9104dd7460d2f53575239dc1f4bcdd374e575b8a504568bfc2e078R325. which will cause "Narsil/starcoder-gptq" 2 TP not to use exllama kernel. if you check 1TP of this model, which is using exllama kernel. the generation result is close to the current 2TP result which is using exllama.
@Narsil to comment.
seems the failure is not related with the PR ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_simple - RuntimeError: Launcher crashed ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_all_params - RuntimeError: Launcher crashed ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_load - RuntimeError: Launcher crashed
Merged in https://github.com/huggingface/text-generation-inference/pull/2665