torchtune fix convert_weights not working for Qwen2.5 HF checkpoints

Summary: In QWen2.5, the attention's linear projection layer has bias=True, but torchtune.convert_weights is not yet supporting bias=True. This diff add support for that

Differential Revision: D67880222

Jan 06 '25 23:01 zhangtemplar

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2233

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 1 New Failure

As of commit 08811335566ab0bd50674753e249efe8f361a665 with merge base 213f38605ff0b7b1e20f85a9e032710be04c82c9 ():

NEW FAILURE - The following job has failed:

Lint / lint (3.10) (gh) Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jan 06 '25 23:01 pytorch-bot[bot]

This pull request was exported from Phabricator. Differential Revision: D67880222

Jan 06 '25 23:01 facebook-github-bot

@calvinpelletier can you review?

Jan 07 '25 01:01 ebsmothers

Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.

In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here

Jan 08 '25 03:01 calvinpelletier

Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.

In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here

This raises a good point tho that we don't tell the user if their model type is wrong. We should probably allow either QWEN2_5 or QWEN2 to point to the same conversion function.

Jan 09 '25 21:01 joecummings

No actual bug here - closing.

Jan 14 '25 18:01 joecummings