KittiSeg icon indicating copy to clipboard operation
KittiSeg copied to clipboard

Memory issue with multiclass training

Open jinhangw opened this issue 7 years ago • 2 comments

Thank you so much for the guidance on multiclass training in #15 and #29 . I was able to start the training on my dataset (470 images total, same size as Kitti). My computer has 8GB GPU, 8GB RAM. The process gets killed every time at step 1900/12000 evaluation. No error was shown in the log files also. Could anyone advise on this issue please?

Thanks!

jinhangw avatar Jun 12 '17 17:06 jinhangw

Are you sure it is a memory issue? Can you share the stacktrace or output stderr to some log file? Errors may have been printed to stderr but not to the output.log files.

You need to write your own eval code because TensorVision currently only works in the binary case. See the FAQ: https://github.com/MarvinTeichmann/KittiSeg/blob/master/docu/FAQ.md for more details. In particular:

In addition, you will need to write new evaluation code. The current evaluator file computes kitti scores which are only defined on binary segmentation problems.

The training script calls the evaluator during training at the steps defined in your hypes .json (default is KittiSeg.json). See:

"logging": {
    "display_iter": 20,
    "eval_iter": 100,
    "write_iter": 100,
    "save_iter": 2000,
    "image_iter": 20000
  },

In the above case, for example, the eval script will be called at every 100th training step.

In order to allow the training script to run without writing your own eval code, you must comment out the following lines:

  • L261-L316 https://github.com/TensorVision/TensorVision/blob/master/tensorvision/train.py#L261-L316
  • L328-L330 https://github.com/TensorVision/TensorVision/blob/master/tensorvision/train.py#L328-L330

villanuevab avatar Jun 12 '17 18:06 villanuevab

@jinhangw I also want to do multiclass, can you say I need to modify those places? Thank you

NingNingL avatar Jan 09 '18 03:01 NingNingL