torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

fix convert_weights not working for Qwen2.5 HF checkpoints

Open zhangtemplar opened this issue 1 year ago • 4 comments

Summary: In QWen2.5, the attention's linear projection layer has bias=True, but torchtune.convert_weights is not yet supporting bias=True. This diff add support for that

Differential Revision: D67880222

zhangtemplar avatar Jan 06 '25 23:01 zhangtemplar

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2233

Note: Links to docs will display an error until the docs builds have been completed.

:x: 1 New Failure

As of commit 08811335566ab0bd50674753e249efe8f361a665 with merge base 213f38605ff0b7b1e20f85a9e032710be04c82c9 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Jan 06 '25 23:01 pytorch-bot[bot]

This pull request was exported from Phabricator. Differential Revision: D67880222

facebook-github-bot avatar Jan 06 '25 23:01 facebook-github-bot

@calvinpelletier can you review?

ebsmothers avatar Jan 07 '25 01:01 ebsmothers

Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.

In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here

calvinpelletier avatar Jan 08 '25 03:01 calvinpelletier

Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.

In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here

This raises a good point tho that we don't tell the user if their model type is wrong. We should probably allow either QWEN2_5 or QWEN2 to point to the same conversion function.

joecummings avatar Jan 09 '25 21:01 joecummings

No actual bug here - closing.

joecummings avatar Jan 14 '25 18:01 joecummings