spirl icon indicating copy to clipboard operation
spirl copied to clipboard

How to speed up the training process?

Open BrightMoonStar opened this issue 11 months ago • 0 comments

I found that the GPU utilization when running training-scriptpython3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 is very low. Could you provide some suggestions to make full use of GPU resource to speed up the process?I have tried to set num_worker larger but it seems doesn't help ,and when I try to set batch_size larger, there will be mistakes like following

len val dataset 160
Running Testing
Traceback (most recent call last):
  File "spirl/spirl/train.py", line 390, in <module>
    ModelTrainer(args=get_args())
  File "spirl/spirl/train.py", line 76, in __init__
    self.train(start_epoch)
  File "spirl/spirl/train.py", line 105, in train
    self.val()
  File "spirl/spirl/train.py", line 199, in val
    self.evaluator.dump_results(self.global_step)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 66, in dump_results
    self.dump_metrics(it)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 72, in dump_metrics
    best_idxs = 0 if self._top_of_n == 1 else self._get_best_idxs(self.full_eval_buffer[self._top_comp_metric])
TypeError: 'NoneType' object is not subscriptable

Thank you very much!

BrightMoonStar avatar Mar 12 '24 08:03 BrightMoonStar