ldwang comments

Results 130 comments of


                                            ldwang

About max token length

@tridao If there is no causal_conv1d_fn , how does the normal [conv1d](https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py#L64-L72) perform causally? Thanks https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py#L168

Question about finetuning

https://github.com/state-spaces/mamba/issues/108

[bug] deepspeed zero3 +lora 训练mixtrial-7*8模型会hang，无法计算loss

同样问题

Can't run examples/vit_cifar100/train_deepspeed.py

You can manually update flagai sources. https://github.com/FlagAI-Open/FlagAI/commit/0fc0cc2fb70cbc151119929cb440922802b241d5 Later we plan to make a new release.

the inference speed of GPTQ 4bit quantized model

I tested, but int4 costs 2 time of FP16. Anything wrong?

[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective

> Some zero data caused it! @980202006 could you explain this root cause and what is zero data like, thanks.

Support Mixtral 8*7B MOE

> Great work! Great work! Look forward to a script of convert megatron mixtral to hf.

Can not use tool and function-call?

> 同样关注这个能力，是否有这部分能力的强化？对于企业应用，这块能力太重要了请问国内哪些模型这个能力比较强，谢谢

官方可以做一版Aquila2Chat-7B到ModelScope上吗？

智源社区的模型文件和huggingface一样，可以按需下载任一即可。

官方可以做一版Aquila2Chat-7B到ModelScope上吗？

https://modelscope.cn/models/BAAI/AquilaChat2-7B