vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Misc] `compressed-tensors` code reuse

Open kylesayrs opened this issue 1 year ago • 7 comments

The reused classes are

  1. CompressionFormat
  2. QuantizationArgs
  3. QuantizationStrategy
  4. QuantizationType

kylesayrs avatar Aug 07 '24 17:08 kylesayrs

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

github-actions[bot] avatar Aug 07 '24 17:08 github-actions[bot]

/ready

dsikka avatar Aug 08 '24 18:08 dsikka

@kylesayrs you're missing https://github.com/vllm-project/vllm/blob/5923532e15b12832febe91ef5e9a4cef2786cefc/requirements-test.txt#L20

dsikka avatar Aug 08 '24 21:08 dsikka

Current state looks good so far.

Biggest piece of feedback is that we are still rewriting the logic associated with parsing the config. Specifically, the get_scheme function in compressed-tensors.py has this duplicated code

  • https://github.com/vllm-project/vllm/pull/7277/files#diff-db60d9d797517c6bdf2ce051b532fe6fa6bceab04516c6d8c2d80a2ae803abe9L263

It will be tricky to fix this (because the vLLM state_dict is not a 1:1 map with the transformers state_dict), so feel free to reach out if you need any pointers.

robertgshaw2-redhat avatar Aug 09 '24 14:08 robertgshaw2-redhat

@robertgshaw2-neuralmagic I think updating the get_scheme function is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoring get_scheme should be a follow-up

dsikka avatar Aug 09 '24 15:08 dsikka

These test failures seem unrelated to this PR? The a few seem to be cuda errors and one is complaining about bad llm metrics measurements

kylesayrs avatar Aug 09 '24 16:08 kylesayrs

@robertgshaw2-neuralmagic I think updating the get_scheme function is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoring get_scheme should be a follow-up

Sounds good.

@kylesayrs im just running this by simon but we should be good to go

robertgshaw2-redhat avatar Aug 09 '24 16:08 robertgshaw2-redhat