spirl
spirl copied to clipboard
How to speed up the training process?
I found that the GPU utilization when running training-scriptpython3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160
is very low. Could you provide some suggestions to make full use of GPU resource to speed up the process?I have tried to set num_worker larger but it seems doesn't help ,and when I try to set batch_size larger, there will be mistakes like following
len val dataset 160
Running Testing
Traceback (most recent call last):
File "spirl/spirl/train.py", line 390, in <module>
ModelTrainer(args=get_args())
File "spirl/spirl/train.py", line 76, in __init__
self.train(start_epoch)
File "spirl/spirl/train.py", line 105, in train
self.val()
File "spirl/spirl/train.py", line 199, in val
self.evaluator.dump_results(self.global_step)
File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 66, in dump_results
self.dump_metrics(it)
File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 72, in dump_metrics
best_idxs = 0 if self._top_of_n == 1 else self._get_best_idxs(self.full_eval_buffer[self._top_comp_metric])
TypeError: 'NoneType' object is not subscriptable
Thank you very much!