pkumc issues

Results 5 issues of


pkumc

click Generate Suggestions button received html source code, do I need other libraries to render it?

![image](https://user-images.githubusercontent.com/9345057/56113021-383d1500-5f8f-11e9-85eb-18eea76741da.png)

scikitlearn_mnist example objective function is accuracy, why goal is set to MINIMIZE in config.json?

change goal to MAXIMIZE in advisor_client/examples/scikitlearn_mnist/config.json.

How to enable batch evaluation? I got RuntimeError: CUDA error: device-side assert triggered

To accelerate evaluation, I want to generate with multiple prompts rather than only one prompt. But I got following CUDA error. Can someone help this? **error**： 131 ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block:...

Qwen1.5-MoE-A2.7B模型训练时如何初始化的

@JustinLin610 [博客](https://qwenlm.github.io/zh/blog/qwen-moe/)里面提到“我们首先利用已有的Qwen-1.8B，将其改造为Qwen1.5-MoE-A2.7B。此外，在初始化阶段引入随机性可以显著加快收敛速度，并在整个预训练过程中带来更好的整体性能表现”。有两个问题想请教下： 1. 是不是先按照Qwen1.5-1.8B的intermediate_size 5504进行分割，分割成4个小的expert，每个expert是1376维。然后再加入随机性，加上随机初始化的32维，变成1408维？其余非moe的参数，就直接继承Qwen1.5-1.8B？ 2. 初始化后，博客又提到“由于我们的初始化方法，我们不需要训练同样数量的token即可达到很好的模型效果，这也显著了降低了训练成本。”这块大概是用了多少token进行继续训练的呢？

Non-Relu LLM inference sparse activation speedup

@YixinSong-e @ZeyuMi Very excellent work! By the way, have you compared the inference speedup of Non-Relu LLM, such as original mistral-7B/llama-7B? If non-Relu LLM is also sparse in a degree(Figure...