[Misc] `compressed-tensors` code reuse
- Reuse existing implementations of compression and quantization configs defined by compressed-tensors
- Adds compressed-tensors as a dependency to vllm
The reused classes are
CompressionFormatQuantizationArgsQuantizationStrategyQuantizationType
👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.
Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).
To run full CI, you can do one of these:
- Comment
/readyon the PR - Add
readylabel to the PR - Enable auto-merge.
🚀
/ready
@kylesayrs you're missing https://github.com/vllm-project/vllm/blob/5923532e15b12832febe91ef5e9a4cef2786cefc/requirements-test.txt#L20
Current state looks good so far.
Biggest piece of feedback is that we are still rewriting the logic associated with parsing the config. Specifically, the get_scheme function in compressed-tensors.py has this duplicated code
- https://github.com/vllm-project/vllm/pull/7277/files#diff-db60d9d797517c6bdf2ce051b532fe6fa6bceab04516c6d8c2d80a2ae803abe9L263
It will be tricky to fix this (because the vLLM state_dict is not a 1:1 map with the transformers state_dict), so feel free to reach out if you need any pointers.
@robertgshaw2-neuralmagic I think updating the get_scheme function is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoring get_scheme should be a follow-up
These test failures seem unrelated to this PR? The a few seem to be cuda errors and one is complaining about bad llm metrics measurements
@robertgshaw2-neuralmagic I think updating the
get_schemefunction is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoringget_schemeshould be a follow-up
Sounds good.
@kylesayrs im just running this by simon but we should be good to go