pkumc

Results 5 issues of pkumc

![image](https://user-images.githubusercontent.com/9345057/56113021-383d1500-5f8f-11e9-85eb-18eea76741da.png)

change goal to MAXIMIZE in advisor_client/examples/scikitlearn_mnist/config.json.

To accelerate evaluation, I want to generate with multiple prompts rather than only one prompt. But I got following CUDA error. Can someone help this? **error**: 131 ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block:...

@JustinLin610 [博客](https://qwenlm.github.io/zh/blog/qwen-moe/)里面提到“我们首先利用已有的Qwen-1.8B,将其改造为Qwen1.5-MoE-A2.7B。此外,在初始化阶段引入随机性可以显著加快收敛速度,并在整个预训练过程中带来更好的整体性能表现”。有两个问题想请教下: 1. 是不是先按照Qwen1.5-1.8B的intermediate_size 5504进行分割,分割成4个小的expert,每个expert是1376维。然后再加入随机性,加上随机初始化的32维,变成1408维?其余非moe的参数,就直接继承Qwen1.5-1.8B? 2. 初始化后,博客又提到“由于我们的初始化方法,我们不需要训练同样数量的token即可达到很好的模型效果,这也显著了降低了训练成本。”这块大概是用了多少token进行继续训练的呢?

@YixinSong-e @ZeyuMi Very excellent work! By the way, have you compared the inference speedup of Non-Relu LLM, such as original mistral-7B/llama-7B? If non-Relu LLM is also sparse in a degree(Figure...