mynewstart comments

Results 27 comments of


                                            mynewstart

[feature request] unable to override `dist.init_process_group` timeout in under `zero.Init`

How to solve it, same error

[BUG] Inference fail with "mat1 and mat2 shapes cannot be multiplied" for Llama model.

same problem, how to split-load the checkpoint using the meta device when using deepspped.init_inference?

chtglm量化后模型推理速度更慢了是什么原因？

> 在chatglm的量化时，遇到了activation outliers问题 > > 因此chatglm-int8的做法是，只对模型参数进行量化，对activation value（可以理解为中间计算）仍然使用fp16精度 > > 这样一来，确实可以节省显存，但推理速度会降低如果中间结果使用fp16精度的话，推理速度不应该和之前fp16差不多吗？

[Question] 预训练时间和预训练数据

我想问下这个代码是把数据一次性加载进内存了，如果数据量很大1.4T tokens大概5T左右的数据量，是不是内存放不下呀。

模型词表相关疑问

同问，以及为什么没直接使用BBPE进行训练，将2000万文本全部转为Byte再做BPE？

> > 速度慢应该是正常的，现在是采用混合精度来实现。主要目的是省显存。内存不够，试试调整一下swap区，看看能不能行。 > > @jameswu2014 非常感谢，这样我就明白了。后续有没有计划直接int8计算，或者其他的加速方案比如fastertransformer？请问比较慢的原因是因为模型中间计算还是用的fp16寸的，只是模型参数变为int8了是吗？以及中间结果用fp16存的话，为何不能和量化前的模型速度差不多，主要是慢在哪个地方了？慢在了int8->fp16,反量化。后续我们会迭代，请持续关注，谢谢。

mynewstart

[feature request] unable to override `dist.init_process_group` timeout in under `zero.Init`

[BUG] Inference fail with "mat1 and mat2 shapes cannot be multiplied" for Llama model.

chtglm量化后模型推理速度更慢了是什么原因？

[Question] 预训练时间和预训练数据

模型词表相关疑问

P40 int8推理过于慢

P40 int8推理过于慢

P40 int8推理过于慢

Mixtral 8x7B full finetune with DS zero3: Assertion error

关于平行语料的预处理