vall-e
vall-e copied to clipboard
can not run python3 -m vall_e.train yaml=config/test/nar.yml
python3 -m vall_e.train yaml=config/test/nar.yml --debug
跑这个的时候报错了.chatgpt4 说是有可能是原始文件的问题但是又没法给出具体的建议.只能问作者了.
trainer.train(
File "/sam/vall-e/vall_e/utils/trainer.py", line 150, in train
for batch in _make_infinite_epochs(train_dl):
File "/sam/vall-e/vall_e/utils/trainer.py", line 103, in _make_infinite_epochs
yield from dl
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
Added tensor with shape: torch.Size([])
Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([]) Added tensor with shape: torch.Size([]) Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([]) Added tensor with shape: torch.Size([])
chatgpt4 帮我写的程序: root@CH-202203180108:/sam/vall-e/data# cat 1.py
import torch
train_qnt = torch.load('/sam/vall-e/data/train/one.qnt.pt') print("Train qnt shape:", train_qnt.shape)
val_qnt = torch.load('/sam/vall-e/data/val/test.qnt.pt') print("Val qnt shape:", val_qnt.shape)
root@CH-202203180108:/sam/vall-e/data# python3 1.py Train qnt shape: torch.Size([3]) Val qnt shape: torch.Size([1, 8, 149])
data的目录结构: root@CH-202203180108:/sam/vall-e/data# ll total 24 drwxr-xr-x 5 root root 4096 Mar 30 21:07 ./ drwxr-xr-x 8 root root 4096 Mar 30 23:45 ../ -rw-r--r-- 1 root root 216 Mar 30 21:07 1.py drwxr-xr-x 2 root root 4096 Mar 28 14:27 test/ drwxr-xr-x 2 root root 4096 Mar 30 23:34 train/ drwxr-xr-x 2 root root 4096 Mar 28 14:55 val/
train目录文件:
root@CH-202203180108:/sam/vall-e/data# ll train/ total 408 drwxr-xr-x 2 root root 4096 Mar 30 23:34 ./ drwxr-xr-x 5 root root 4096 Mar 30 21:07 ../ -rw-r--r-- 1 root root 159 Mar 28 14:53 1.py -rw-r--r-- 1 root root 37 Mar 28 14:49 one.phn.txt -rw-r--r-- 1 root root 747 Mar 28 14:54 one.qnt.pt -rw-r--r-- 1 root root 26 Mar 28 14:38 test.phn.txt -rw-r--r-- 1 root root 10286 Mar 28 14:38 test.qnt.pt -rw-r--r-- 1 root root 380750 Mar 30 23:34 test.wav root@CH-202203180108:/sam/vall-e/data#
报错了不知道怎么搞
gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题
gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题
你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了
gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题
是缺了什么东西了吗
'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype
encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.
vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information.
Can anyone help?
gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题
你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了
我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt.
'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype
See the discussion here: https://github.com/enhuiz/vall-e/issues/87
'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype
See the discussion here: #87
thanks
gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题
你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了
我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt.
gpt给你的建议是对的 经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度,不太对,检查一下你用qnt编码的过程是不是出了什么问题
你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了
我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt.
thx
encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.
vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information.
Can anyone help?
I opened a pull request that deals with this issue. Make sure to have mpi4py installed correctly, as I utilize the default initialization of distributed training which might search for mpis.
!pip install deepspeed==0.8.3 make it alright
牛皮 thx,解决了train的问题
!pip install deepspeed==0.8.3 make it alright