symhsym issues

Repositories
Issues
Comments

Results 2 issues of


                                            symhsym

Question about large-scale data training and packing algorithm for MOSS-style inputs

Thank you for your great work on MOSS—it’s been very inspiring! I believe the model couldn't have been trained on individual samples sequentially due to efficiency concerns. Given MOSS's unique...

[DDP] 并行训练问题/Multi-head shared-backbone model triggers “Expected to mark a variable ready only once” — how to parallelize training?

In multi-GPU DDP training, the model has a shared backbone (LLM) and multiple output heads (8 channels, each computing a different loss). In a single forward pass, all heads use...