Mars

Results 14 comments of Mars

use this, and apt install scipy PyYAML matplotlib librosa==0.8.0 tqdm pandas numba==0.53.1 numpy==1.19.2 #scipy==1.3 #PyYAML==5.3.1 tensorboardX pyloudnorm setuptools>=41.0.0 g2p_en resemblyzer==0.1.1.dev0 webrtcvad tensorboard==2.6.0 scikit-learn==0.24.1 scikit-image==0.16.2 textgrid jiwer pycwt PyWavelets praat-parselmouth==0.3.3 jieba...

> thanks for reply does the nums_updates in the log means step? if true, it consume 2 hour for each 100 step in the picture, so it means it will...

> thanks for reply i will try to use more GPU. There is an other question, when pretraining, the num_workers is 0, why don't set it to a higher number...

> 与微调的block_size参数有关,2048已够长了。如果需要更长的回答,可传入历史多轮对话 但是,用decode超过2048就会乱生成,没检测到结束符就会一直生成,直接内存就爆了。 有达到最大长度自动停止的配置吗?

> 与微调的block_size参数有关,2048已够长了。如果需要更长的回答,可传入历史多轮对话 您说的传入历史是指将上一步没生成完的输出当做输入再送一遍吗?

> 有可能没有结束符,这个也没什么好办法,将上次输入、输出放到history入参里,本次提示词用“继续”,这个例子是用Llama-2-7b-chat微调,效果一般。后来我在原始模型上微调过一次,效果比这个好一些。https://github.com/git-cloner/Llama2-chinese 个人理解,history的长度也算在2048内,他只是拼接到当前的输入前面了。如果上一步超了,下一步也生成不出来吧 ![image](https://github.com/git-cloner/llama2-lora-fine-tuning/assets/24523981/96e7f709-81e1-40e7-bd28-4c7dd6ea6ad2)

> config env var: NCCL_DEBUG=INFO NCCL will print detail log. Maybe you should make sure that your NCCL is installed correctly by using NCCL-TEST. msm-h200-2:4009:4009 [0] NCCL INFO cudaDriverVersion 12040...

shortk8snode10:171:171 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:171:171 [0] NCCL INFO Bootstrap : Using eth0:192.168.128.8 shortk8snode10:171:171 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file:...

> can you provide more information, the compile enviroment such as CUDA version and hardware info? > > we support FP8. Don't know why it fails. same error in h200...