Trainer
Trainer copied to clipboard
[Bug] ValueError: not allowed to raise maximum limit (rlimit)
Describe the bug
Error while training:-
- I tried with sudo same error
- I am using docker image nvidia/cuda:11.7.0-base-ubuntu22.04
- The default value of the docker container for command
resource.getrlimit(resource.RLIMIT_NOFILE)
is(1048576, 1048576)
| > stats_path:None
2023-06-14T07:29:43.025431079Z | > base:10
2023-06-14T07:29:43.025437149Z | > hop_length:256
2023-06-14T07:29:43.025444429Z | > win_length:1024
2023-06-14T07:29:43.025450699Z > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z trainer = Trainer(
2023-06-14T07:29:43.025484659Z File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit
Due this line: https://github.com/coqui-ai/Trainer/blob/9879d3ded5322d634879a5dfd0515e0dfcf732bc/trainer/trainer.py#L653-L660
To Reproduce
- Install coqui-tts in nvidia/cuda:11.7.0-base-ubuntu22.04 docker container
- Try train vits model
- This error is throw (even with sudo)
Expected behavior
No errors
Logs
| > stats_path:None
2023-06-14T07:29:43.025431079Z | > base:10
2023-06-14T07:29:43.025437149Z | > hop_length:256
2023-06-14T07:29:43.025444429Z | > win_length:1024
2023-06-14T07:29:43.025450699Z > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z trainer = Trainer(
2023-06-14T07:29:43.025484659Z File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit
Environment
{
"CUDA": {
"GPU": [
"Tesla V100-FHHL-16GB"
],
"available": true,
"version": "11.7"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "2.0.1+cu117",
"Trainer": "v0.0.20",
"numpy": "1.22.4"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.10.6",
"version": "#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020"
}
}
Additional context
No response
Change
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
in /usr/local/lib/python3.10/dist-packages/trainer/trainer.py (line 632)
To:
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, 4096))