darts
darts copied to clipboard
[BUG] semaphore or lock released too many times
Describe the bug
I am learning darts and optuna hyperparameter optimization from the guide: https://unit8co.github.io/darts/userguide/hyperparameter_optimization.html#hyperparameter-optimization-with-optuna. I trained the model using GPU and 4 workers, got the error:
Metric val_loss improved by 0.010 >= min_delta = 0.001. New best score: 0.651
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [02:20<00:00, 46.78s/it, train_loss=1.830]
Exception in thread QueueFeederThread:
Exception in thread QueueFeederThread:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 239, in _feed
reader_close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 177, in close
self._close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 271, in _feed
queue_sem.release()
ValueError: semaphore or lock released too many times
Exception ignored in: <function _ConnectionBase.__del__ at 0x7fc62620c820>
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 132, in __del__
self._close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor
GPU available: True (cuda), used: True
To Reproduce The code is from https://unit8co.github.io/darts/userguide/hyperparameter_optimization.html#hyperparameter-optimization-with-optuna
Expected behavior No error
System (please complete the following information):
- Python version: 3.10.12
- darts version: 0.24.0
Hi @jacktang, thank your for writing.
Can you please indicate which cell of the notebook raise the error? It seems like it comes from one of darts dependencies...
Also, can you try upgrading to darts 0.25.0?
OK. I upgraded to 0.25.0, and converted the code to python code. But the error still exists. The OS is Ubuntu 20.04.4 LTS
Best value: 29.46530282497406, Best params: {'kernel_size': 3, 'num_filters': 4, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.017801282281381472, 'lr': 8.169771024932909e-05, 'year': False}
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
----------------------------------------------------
0 | criterion | MSELoss | 0
1 | train_metrics | MetricCollection | 0
2 | val_metrics | MetricCollection | 0
3 | dropout | MonteCarloDropout | 0
4 | res_blocks | ModuleList | 166
----------------------------------------------------
166 Trainable params
0 Non-trainable params
166 Total params
0.001 Total estimated model params size (MB)
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 42.49it/s, train_loss=8.210[I 2023-08-08 17:42:53,616] Trial 16 pruned. Trial was pruned at epoch 0.█████████████████████████| 1/1 [00:00<00:00, 652.30it/s]
Current value: 5.6125102043151855, Current params: {'kernel_size': 3, 'num_filters': 3, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.10989051943366332, 'lr': 0.0008949513735868809, 'year': False}
Best value: 29.46530282497406, Best params: {'kernel_size': 3, 'num_filters': 4, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.017801282281381472, 'lr': 8.169771024932909e-05, 'year': False}
Epoch 1: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 5.10it/s, train_loss=1.000, val_loss=0.859]
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 5.07s/it, train_loss=8.210]
Exception in thread QueueFeederThread:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 239, in _feed
reader_close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 177, in close
self._close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 271, in _feed
queue_sem.release()
ValueError: semaphore or lock released too many times
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
----------------------------------------------------
0 | criterion | MSELoss | 0
1 | train_metrics | MetricCollection | 0
2 | val_metrics | MetricCollection | 0
3 | dropout | MonteCarloDropout | 0
4 | res_blocks | ModuleList | 68
----------------------------------------------------
68 Trainable params
0 Non-trainable params
68 Total params
0.000 Total estimated model params size (MB)
Epoch 7: 100%|██████████████████████████████████████████████████| 3/3 [01:04<00:00, 21.54s/it, train_loss=0.794, val_loss=1.220]
Exception in thread QueueFeederThread:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 239, in _feed
reader_close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 177, in close
self._close()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/connection.py", line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dev/miniconda3/envs/pf/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dev/miniconda3/envs/pf/lib/python3.10/multiprocessing/queues.py", line 271, in _feed
queue_sem.release()
ValueError: semaphore or lock released too many times
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [01:32<00:00, 30.92s/it, train_loss=1.360]
Epoch 7: 100%|██████████████████████████████████████████████████| 3/3 [01:15<00:00, 25.29s/it, train_loss=0.903, val_loss=0.999]
Epoch 0: 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 48.06it/s, train_loss=1.370[I 2023-08-08 17:44:09,206] Trial 17 pruned. Trial was pruned at epoch 0.█████████████████████████| 1/1 [00:00<00:00, 638.21it/s]
Current value: 1.354859471321106, Current params: {'kernel_size': 4, 'num_filters': 2, 'weight_norm': False, 'dilation_base': 3, 'dropout': 0.045057036646966524, 'lr': 7.765323102891736e-05, 'year': False}
Best value: 29.46530282497406, Best params: {'kernel_size': 3, 'num_filters': 4, 'weight_norm': False, 'dilation_base': 2, 'dropout': 0.017801282281381472, 'lr': 8.169771024932909e-05, 'year': False}