transformers icon indicating copy to clipboard operation
transformers copied to clipboard

🚨 Support dequantization for most GGML types

Open Isotr0py opened this issue 6 months ago • 4 comments

What does this PR do?

This PR needs to wait gguf package version update and still work in progress.

  • This PR aims to add dequantization support for the remaining ggml_types. And clean up the current ggml dequantization implementation.
  • Since llama.cpp has added numpy dequantization implementation on gguf-py in https://github.com/ggerganov/llama.cpp/pull/8939, we can dequantize most of ggml tensor easily in one line.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [x] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Isotr0py avatar Aug 12 '24 12:08 Isotr0py