Better support of tp checkpoint loading

Open yinghai opened this issue 10 months ago • 0 comments

Motivation

We have support of use_presharded_weights in various places but didn't have a place to inform each tp rank to load a different tp checkpoint file.

Modifications

This PR adds a flag to let each tp rank worker to load a different tp checkpoint file based on the filename pattern. Also plumbed through a LLAMA example.

USE_PRESHARDED_WEIGHTS=1 uv run python third_party/sglang/examples/runtime/engine/offline_batch_inference.py  --model-path /tmp/hf/ --tp-size 2 --tp-checkpoint-name-pattern "rank-"

And the files in /tmp/hf/ is like

-rw-rw---- 1 yinghai default  885 Feb  6 18:58 config.json
-rw-rw---- 1 yinghai default 2.8G Feb  6 18:58 model-rank-0-part-0.safetensors
-rw-rw---- 1 yinghai default 2.8G Feb  6 18:58 model-rank-1-part-0.safetensors
-rw-rw---- 1 yinghai default  301 Feb  6 18:58 special_tokens_map.json
-rw-rw---- 1 yinghai default  50K Feb  6 18:58 tokenizer_config.json
-rw-rw---- 1 yinghai default  17M Feb  6 18:58 tokenizer.json

Checklist

[ x] Format your code according to the Code Formatting with Pre-Commit.
[x] Add unit tests as outlined in the Running Unit Tests.
[x ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
[x] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
[x] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

Feb 07 '25 08:02 yinghai