infinity icon indicating copy to clipboard operation
infinity copied to clipboard

v0 add autoquant

Open michaelfeil opened this issue 1 year ago • 1 comments

This pull request introduces several changes to the infinity_emb library, focusing on adding support for a new autoquant data type, updating documentation, and improving the quantization process. The most important changes include adding the autoquant data type, updating the CLI documentation, modifying quantization logic, and adding unit tests for autoquant quantization.

New Features:

  • Added autoquant data type to Dtype enum in libs/infinity_emb/infinity_emb/primitives.py.
  • Updated quantization logic to handle autoquant in libs/infinity_emb/infinity_emb/transformer/quantization/interface.py and libs/infinity_emb/infinity_emb/transformer/quantization/quant.py [1] [2].

Documentation Updates:

  • Updated CLI documentation to include autoquant in docs/docs/cli_v2.md.

Codebase Improvements:

  • Modified Makefile to use poetry run for generating OpenAPI and CLI v2 documentation in libs/infinity_emb/Makefile [1] [2].

Dependency Updates:

  • Added torchao as an optional dependency in libs/infinity_emb/pyproject.toml [1] [2].

Testing Enhancements:

  • Added unit tests for autoquant quantization in libs/infinity_emb/tests/unit_test/transformer/quantization/test_interface.py.

michaelfeil avatar Oct 07 '24 05:10 michaelfeil

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 21.42857% with 11 lines in your changes missing coverage. Please review.

Project coverage is 73.24%. Comparing base (0f1b786) to head (55a8b0e).

Files with missing lines Patch % Lines
...infinity_emb/transformer/quantization/interface.py 14.28% 6 Missing :warning:
...emb/infinity_emb/transformer/quantization/quant.py 0.00% 5 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

:exclamation: There is a different number of reports uploaded between BASE (0f1b786) and HEAD (55a8b0e). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (0f1b786) HEAD (55a8b0e)
2 1
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   79.01%   73.24%   -5.77%     
==========================================
  Files          40       40              
  Lines        3173     3184      +11     
==========================================
- Hits         2507     2332     -175     
- Misses        666      852     +186     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 07 '24 05:10 codecov-commenter