Mars comments

Results 14 comments of


                                            Mars

Project dependencies may have API risk issues

use this, and apt install scipy PyYAML matplotlib librosa==0.8.0 tqdm pandas numba==0.53.1 numpy==1.19.2 #scipy==1.3 #PyYAML==5.3.1 tensorboardX pyloudnorm setuptools>=41.0.0 g2p_en resemblyzer==0.1.1.dev0 webrtcvad tensorboard==2.6.0 scikit-learn==0.24.1 scikit-image==0.16.2 textgrid jiwer pycwt PyWavelets praat-parselmouth==0.3.3 jieba...

pretrain loss

> thanks for reply does the nums_updates in the log means step? if true, it consume 2 hour for each 100 step in the picture, so it means it will...

pretrain loss

> thanks for reply i will try to use more GPU. There is an other question, when pretraining, the num_workers is 0, why don't set it to a higher number...

decoder输出长度是有限制吗？

> 与微调的block_size参数有关，2048已够长了。如果需要更长的回答，可传入历史多轮对话但是，用decode超过2048就会乱生成，没检测到结束符就会一直生成，直接内存就爆了。有达到最大长度自动停止的配置吗？

decoder输出长度是有限制吗？

> 与微调的block_size参数有关，2048已够长了。如果需要更长的回答，可传入历史多轮对话您说的传入历史是指将上一步没生成完的输出当做输入再送一遍吗？

decoder输出长度是有限制吗？

> 有可能没有结束符，这个也没什么好办法，将上次输入、输出放到history入参里，本次提示词用“继续”，这个例子是用Llama-2-7b-chat微调，效果一般。后来我在原始模型上微调过一次，效果比这个好一些。https://github.com/git-cloner/Llama2-chinese 个人理解，history的长度也算在2048内，他只是拼接到当前的输入前面了。如果上一步超了，下一步也生成不出来吧 ![image](https://github.com/git-cloner/llama2-lora-fine-tuning/assets/24523981/96e7f709-81e1-40e7-bd28-4c7dd6ea6ad2)

when i installed nvshmem_src, nccl get error, how to fix?

> config env var: NCCL_DEBUG=INFO NCCL will print detail log. Maybe you should make sure that your NCCL is installed correctly by using NCCL-TEST. msm-h200-2:4009:4009 [0] NCCL INFO cudaDriverVersion 12040...

error when testing test_internode.sh deep_ep.cpp:83 'an illegal memory access was encountered'

shortk8snode10:171:171 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:171:171 [0] NCCL INFO Bootstrap : Using eth0:192.168.128.8 shortk8snode10:171:171 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file:...

test_inter_node.py failed: an illegal memory access was encountered

same error, any solution?

[BUG] Fp8 Runtime Error: "bad any_cast"

> can you provide more information, the compile enviroment such as CUDA version and hardware info? > > we support FP8. Don't know why it fails. same error in h200...