Feature File Path and Database File paths not being written after training
Open
mr-segfault
opened this issue 2 years ago
•
16 comments
Using current version of RVC (pulled the latest to verify just before writing this report),
When Training, it generates the weights but does not generate the feature file or database file required for inference.
*** No Index File is created**
I'm using an Ubuntu system, I installed RVC via a venv to ensure no conflicts. This is the tail end of the training excerpt and crash log:
INFO:model3:Saving model and optimizer state at epoch 200 to ./logs/model3/G_200.pth
INFO:model3:Saving model and optimizer state at epoch 200 to ./logs/model3/D_200.pth
INFO:model3:====> Epoch: 200
INFO:model3:Training is done. The program is closed.
INFO:model3:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
main()
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
mp.spawn(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration
return next(iterator)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 844, in train1key
big_npy = np.concatenate(npys, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Updated to the current version that does not need the total_fea.npy output and the error still persists:
INFO:train5:Saving model and optimizer state at epoch 20 to ./logs/train5/G_20.pth
INFO:train5:Saving model and optimizer state at epoch 20 to ./logs/train5/D_20.pth
INFO:train5:====> Epoch: 20
INFO:train5:Training is done. The program is closed.
INFO:train5:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
main()
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
mp.spawn(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration
return next(iterator)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 852, in train1key
big_npy = np.concatenate(npys, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Tried again (4/27) with a current build, it looks like the Train Feature Index feature is still broken (not preparing index file)
Logs are slightly different because the code has been changing but the underlying issue is still present.
INFO:train6:====> Epoch: 20
INFO:train6:Training is done. The program is closed.
INFO:train6:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
main()
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
mp.spawn(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration
return next(iterator)
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 859, in train1key
big_npy = np.concatenate(npys, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
I was worried that it was my lower powered PC causing the issue so I tried it on my more powerful one, The logs are less lengthy but are still an expression of the bug - maybe this makes it easier to chase down?
INFO:test2:====> Epoch: 20
INFO:test2:Training is done. The program is closed.
INFO:test2:saving final ckpt:Success.
Traceback (most recent call last):
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
main()
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
mp.spawn(
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/Retrieval-based-Voice-Conversion-WebUI/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
same here, missing the file and then having trouble generating the model.
colab free GPU
INFO:model:Training is done. The program is closed.
INFO:model:saving final ckpt:Traceback (most recent call last):
File "/content/Retrieval-based-Voice-Conversion-WebUI/train/process_ckpt.py", line 79, in savee
torch.save(opt, "weights/%s.pth" % name)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 440, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 315, in _open_zipfile_writer
return container(name_or_buffer)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 288, in init
super().init(torch._C.PyTorchFileWriter(str(name)))
RuntimeError: Parent directory weights/content/dataset does not exist.
Traceback (most recent call last):
File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in
main()
File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
mp.spawn(
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 399, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1299, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1036, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 488, in async_iteration
return next(iterator)
File "/content/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 899, in train1key
big_npy = np.concatenate(npys, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Keyboard interruption in main thread... closing server.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1984, in block_thread
time.sleep(0.1)
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 1535, in
app.queue(concurrency_count=511, max_size=1022).launch(share=True)
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1901, in launch
self.block_thread()
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1987, in block_thread
self.server.close()
File "/usr/local/lib/python3.10/dist-packages/gradio/networking.py", line 43, in close
self.thread.join()
File "/usr/lib/python3.10/threading.py", line 1096, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
KeyboardInterrupt
Save model Zip to Drive
cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/logs/model/added_.index': No such file or directory
cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/logs/model/total_.npy': No such file or directory
cp: cannot stat '/content/Retrieval-based-Voice-Conversion-WebUI/weights/model.pth': No such file or directory
/content/zips/model
zip warning: name not matched: *
zip error: Nothing to do! (try: zip -r model.zip . -i *)
mv: cannot stat 'model.zip': No such file or directory
/content/Retrieval-based-Voice-Conversion-WebUI
+1 facing the exact same issue.
I use epoch 50 to train at colab it will produce .index and can copy to google drive
but when i use epoch 200 to train .index disappeared and can't copy to google drive
This issue is still present on the current (as of today's writing) version though the behavior is different -- a file is written but the application still throws a torch exception:
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 149
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d
I have the same errors in the log as the author of this thread. The application version is the latest at the moment. The training ends. The .pth model is available. Index file is not created.
Has anyone been able to find a solution to this problem?