transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[WIP] Quartet QAT support

Open BlackSamorez opened this issue 7 months ago • 1 comments

This PR adds support for the Quartet QAT method.

The goal of this PR is to integrate inference and training support for the Quartet QAT method. That would allow to perform both forward and backward passes in MXFP4, allowing for very fast training on Blackwell GPUs.

Currently, we're working on the kernels here, here and here (some of the libs aren't public yet). We're planning to release the first version of the kernels this week and have optimized performance by end of June.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

BlackSamorez avatar Jun 09 '25 12:06 BlackSamorez

cc @mekkcyber

Rocketknight1 avatar Jun 09 '25 14:06 Rocketknight1

Hi @BlackSamorez, I'm really looking forward to experimenting with this.

When can we expect to have the kernels public so we can begin testing, even if they are still WIP?

kooshi avatar Jun 30 '25 21:06 kooshi

@MekkCyber Hi, thanks for reviewing this! It took us a while, but all the kernels necessary for inference have been published: I've updated the PR description. May I ask you to do another pass? Your previous comments mostly don't apply anymore because of refactoring.

BlackSamorez avatar Jul 14 '25 16:07 BlackSamorez

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc added docs, improved docstring, cleaned the code where you asked.

BlackSamorez avatar Jul 18 '25 14:07 BlackSamorez

Actually, give me a minute. I'm adding Triton pseudo-quantization kernels for people without Blackwell GPUs to be able to evaluate the models (although without speedups).

BlackSamorez avatar Jul 18 '25 15:07 BlackSamorez

Added pseudoquantization, updated requirements to run the method with it (doesn't require qutlass like that). Added pseudoquant tests. Updated the documentation.

BlackSamorez avatar Jul 18 '25 16:07 BlackSamorez

@SunMarc please take a look at the new raised errors and and warning in the quantizer_fp_quant.py.

BlackSamorez avatar Jul 18 '25 16:07 BlackSamorez

Should be good

BlackSamorez avatar Jul 22 '25 12:07 BlackSamorez

One last nit, the build PR documentation is not passing:

    raise RuntimeError(
RuntimeError: The following files are not present in the table of contents:
- quantization/fp_quant
Add them to ../transformers/docs/source/en/_toctree.yml.

SunMarc avatar Jul 22 '25 15:07 SunMarc

Added it to toctree

BlackSamorez avatar Jul 22 '25 15:07 BlackSamorez

@SunMarc it hit job cancellation somehow. Might need a restart. It should be good.

BlackSamorez avatar Jul 23 '25 08:07 BlackSamorez

[For maintainers] Suggested jobs to run (before merge)

run-slow: fp_quant_integration

github-actions[bot] avatar Jul 23 '25 09:07 github-actions[bot]

Merged ! Thanks for your work

SunMarc avatar Jul 23 '25 09:07 SunMarc

Hey @BlackSamorez, is there a way to make fp_quant compatible with py3.9 ? Our CI runs on this version but fp_quant requires 3.11

SunMarc avatar Jul 24 '25 15:07 SunMarc

I guess I'll have to remove match-case constructions and it'll work. Why run on 3.9 in 2025 though?

BlackSamorez avatar Jul 24 '25 15:07 BlackSamorez

We want to make sure that the min version of python that is maintained runs transformers correctly. When it will reach EOL, we switch to the next version

SunMarc avatar Jul 24 '25 15:07 SunMarc