sglang Tiny refactor DeepSeek V3/R1 NextN shared experts fusion

Motivation

Ref https://github.com/sgl-project/sglang/pull/4918 Ref https://github.com/sgl-project/sglang/pull/5707 Ref https://github.com/sgl-project/sglang/pull/5793

Modifications

Extract the public method compute_shared_experts_fusion_weights and put it in deepseek_v2.py first.
Add necessary unit tests.

Acc in A800

python3 benchmark/gsm8k/bench_sglang.py --num-questions 200 --parallel 128 --num-shots 8 

Accuracy: 0.960
Invalid: 0.000
Latency: 14.804 s
Output throughput: 1451.247 token/s

Benchmark in A800

# qps 16
python3 -m sglang.bench_serving --backend sglang --num-prompts 200 --dataset-name random --max-concurrency 16 --random-input 256 --random-output 256 --seed 42

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Max reqeuest concurrency:                16
Successful requests:                     200
Benchmark duration (s):                  57.65
Total input tokens:                      26096
Total generated tokens:                  26874
Total generated tokens (retokenized):    26763
Request throughput (req/s):              3.47
Input token throughput (tok/s):          452.70
Output token throughput (tok/s):         466.20
Total token throughput (tok/s):          918.90
Concurrency:                             15.77
Accept length:                           2.60
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   4546.43
Median E2E Latency (ms):                 4602.09
---------------Time to First Token----------------
Mean TTFT (ms):                          207.83
Median TTFT (ms):                        174.89
P99 TTFT (ms):                           476.63
---------------Inter-Token Latency----------------
Mean ITL (ms):                           32.54
Median ITL (ms):                         19.18
P95 ITL (ms):                            90.16
P99 ITL (ms):                            168.08
Max ITL (ms):                            389.73
==================================================

Checklist

[x] Format your code according to the Code Formatting with Pre-Commit.
[ ] Add unit tests as outlined in the Running Unit Tests.
[ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
[x] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
[ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
[ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.