fix convert_weights not working for Qwen2.5 HF checkpoints
Summary: In QWen2.5, the attention's linear projection layer has bias=True, but torchtune.convert_weights is not yet supporting bias=True. This diff add support for that
Differential Revision: D67880222
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2233
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 1 New Failure
As of commit 08811335566ab0bd50674753e249efe8f361a665 with merge base 213f38605ff0b7b1e20f85a9e032710be04c82c9 ():
NEW FAILURE - The following job has failed:
- Lint / lint (3.10) (gh)
Process completed with exit code 1.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
This pull request was exported from Phabricator. Differential Revision: D67880222
@calvinpelletier can you review?
Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.
In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here
Hi @zhangtemplar , you're changing the generic
convert_weightsfunction. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here
This raises a good point tho that we don't tell the user if their model type is wrong. We should probably allow either QWEN2_5 or QWEN2 to point to the same conversion function.
No actual bug here - closing.