Tong Zhu (朱桐) issues

Results 19 issues of


                                            Tong Zhu (朱桐)

Read Me First! 遇到报错提issue之前先看这里!

For toolkit usage errors, you must strictly follow the `Toolkit usage` issue template to open a new issue. 对于使用时报错等工具使用类的问题，必须严格使用 `Toolkit usage` issue 模板进行提问。 Otherwise, your issue may be closed directly...

Default random seed

From [u/biadelatrixyaska @ reddit](https://www.reddit.com/r/MachineLearning/comments/rkewa3/d_what_are_your_machine_learning_superstitions/?utm_source=share&utm_medium=web2x&context=3), 42 is a good choice for a default setting. Maybe there are more *best default random seeds*, and we should add these seeds as a default...

enhancement

mixtral branch: dimention mismatch in `cheap_embed`

Here, the dimention in `cheap_embed` is 4-dimentional tensors: https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L87 However, the `gate_vec` receive a 3-dimentional tensor. https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L158-L161

Tong Zhu (朱桐)

Read Me First! 遇到报错提issue之前先看这里!

Default random seed

mixtral branch: dimention mismatch in `cheap_embed`

loss weight mismatch

fix an arxiv link

Loss does not go down

[Feature Request]: move global accelerator into local ones of instantiated tasks

[Feature Request]: decompose history states and models into seperate state files

[Feature Request]: save k best ckpts

[Feature Request]: Migrate from `OmegaConf/DictConfig` objects to `DefaultBaseConfig` object