Trainer icon indicating copy to clipboard operation
Trainer copied to clipboard

[Bug] ValueError: not allowed to raise maximum limit (rlimit)

Open iamkhalidbashir opened this issue 1 year ago • 1 comments

Describe the bug

Error while training:-

  • I tried with sudo same error
  • I am using docker image nvidia/cuda:11.7.0-base-ubuntu22.04
  • The default value of the docker container for command resource.getrlimit(resource.RLIMIT_NOFILE) is (1048576, 1048576)
| > stats_path:None
2023-06-14T07:29:43.025431079Z  | > base:10
2023-06-14T07:29:43.025437149Z  | > hop_length:256
2023-06-14T07:29:43.025444429Z  | > win_length:1024
2023-06-14T07:29:43.025450699Z  > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z   File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z     trainer = Trainer(
2023-06-14T07:29:43.025484659Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z     self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z     resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit

Due this line: https://github.com/coqui-ai/Trainer/blob/9879d3ded5322d634879a5dfd0515e0dfcf732bc/trainer/trainer.py#L653-L660

To Reproduce

  1. Install coqui-tts in nvidia/cuda:11.7.0-base-ubuntu22.04 docker container
  2. Try train vits model
  3. This error is throw (even with sudo)

Expected behavior

No errors

Logs

| > stats_path:None
2023-06-14T07:29:43.025431079Z  | > base:10
2023-06-14T07:29:43.025437149Z  | > hop_length:256
2023-06-14T07:29:43.025444429Z  | > win_length:1024
2023-06-14T07:29:43.025450699Z  > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z   File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z     trainer = Trainer(
2023-06-14T07:29:43.025484659Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z     self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z     resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla V100-FHHL-16GB"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.1+cu117",
        "Trainer": "v0.0.20",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.6",
        "version": "#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020"
    }
}

Additional context

No response

iamkhalidbashir avatar Jun 14 '23 10:06 iamkhalidbashir

Change resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))

in /usr/local/lib/python3.10/dist-packages/trainer/trainer.py (line 632)

To: resource.setrlimit(resource.RLIMIT_NOFILE, (4096, 4096))

SilvioGuedes avatar Dec 24 '23 06:12 SilvioGuedes