spectrum comments

Results 6 comments of


                                            spectrum

RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

Have the same issue when I run 'Pretrain.py'.

RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

> Hi, you can try to add a small positive number to the weights as done here: > > https://github.com/salesforce/ALBEF/blob/fb384204472feab2a85bd4f5790d7889c31672c9/models/model_retrieval.py#L120 > > Batch_size=1 will not work because there needs to...

RuntimeError: Step 1 exited with non-zero status 1

solved. check log for more error info.

效果如何优化

一般来说是要你们自己去微调这个GLM6B模型的，比方说你们想做法律相关的就准备法律领域的数据去微调它 GLM团队那边现在是使用P-tuning去做，吃不了太多显存的，对硬件比较友好

效果如何优化

> 追求高质量的话，question肯定是自己一条条列的，answer可以取巧一下，用gpt4来生成当然也有取巧用gpt4来对文档生成不同question的方案，有点像让大模型自问自答，然后这些QA数据用来微调自己的“小”模型

Qwen-14B-Chat微调后模型 + fastchat 0.2.29 在2x4090上推理速度比其他13B模型慢很多

请问单卡4090推理速度 7 tokens/s，这是正常速度吗同一张卡上，qwen-7b-chat是60+tokens/s 这个性能差别和[这里](https://github.com/QwenLM/Qwen/blob/main/README_CN.md#%E6%8E%A8%E7%90%86%E6%80%A7%E8%83%BD) 列举的7B-chat与14B-chat-int4之间的差别相差挺大的还有就是在安装flash-attention之后推理速度并没有什么变化 - 环境： python3.8+torch2.0.0+cuda11.8+transformers4.36+flash-attn2.4.0 - 硬件：10700K+28G+4090/24G - 代码： ``` from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig from peft import AutoPeftModelForCausalLM from tqdm...