feat: support indivisible shards for TP model loading and TPlizing.

Open kmehant opened this issue 7 months ago • 7 comments

What does this PR do?

Fixes https://github.com/huggingface/transformers/issues/37051

Approach is to support uneven sharding and seek segments of data that mimics torch.chunk since torch.chunk is the style of sharding adopted by torch Shard placements API for both even and uneven sharding. Finally we pass stride and shape to from_local to allow for uneven sharding.

from transformers import AutoModelForCausalLM
import torch

m2 = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", tp_plan=None)
m = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", tp_plan="auto")

print(m.model.layers[0].self_attn.q_proj.weight)
print(m.model.layers[0].self_attn.q_proj.weight.shape)
ft = m.model.layers[0].self_attn.q_proj.weight.full_tensor().to("cpu")
assert torch.equal(ft, m2.model.layers[0].self_attn.q_proj.weight.to("cpu"))
# assert should pass

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. https://github.com/huggingface/transformers/issues/37051
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@ArthurZucker @SunMarc @muellerzr

Apr 02 '25 20:04 kmehant

transformers transformers copied to clipboard

feat: support indivisible shards for TP model loading and TPlizing.

What does this PR do?

Before submitting

Who can review?

transformers
transformers copied to clipboard