rail_marking
rail_marking copied to clipboard
Unable to train the network
Hi, Thank you for your awesome project. I downloaded the dataset and try to train it by myself using the train script. but I encounter this error :
File "/export/tmp/ebrahimi/rail_marking/scripts/segmentation/./../../rail_marking/segmentation/models/ohem_ce_loss.py", line 29, in forward loss_hard = loss[loss > self.thresh.to(device)] RuntimeError: CUDA error: an illegal memory access was encountered
Also, For the dataset, I merge all of the jpegs, pngs and jsons and put them in a folder as set it as the --data_path argument of the script. Is it ok?
@AmirAliEbrahimi Can you please clarify how many label classes are you training? For this repo, I already modified the original dataset to a new one with only 3 classes.
If you trained with different number of classes, you need to create a new cfg file, in cfg directory, and replace the num_classes path.
here is the logic for the dataloader. in my dataset, the images are of jpg format and groundtruths are of png format only; so I differentiate them using these formats. https://github.com/xmba15/rail_marking/blob/master/rail_marking/segmentation/data_loader/railsem_mask_dataset.py#L51-L55
If your dataset is comprised differently, you need to modify the data loader part accordingly.
Thanks for the replay, currently I am using the original RailSem19 and I try to train it with all the classes. so I will try a new cfg file. For ground truths, should I use the 8uC1 label map images provided by the dataset, or use the images annotated by the JSON files?
@AmirAliEbrahimi sorry for the late reply. the ground truth should be 8UC1 label map. please try, if you still have problems with the trainining, maybe I will add the scripts to train the original (not modified) dataset.
@xmba15 Thank you for your response, I would appreciate it if you could add the scripts to train the original dataset.
感谢您的重播,目前我正在使用原始的 RailSem19,我尝试用所有类来训练它。所以我会尝试一个新的 cfg 文件。对于基本事实,我应该使用数据集提供的 8uC1 标签地图图像,还是使用 JSON 文件注释的图像?
My dear friend, I am so sorry to disturb you, but I am curious if you have finished all the training. I would be honored if I could learn from your work
@AmirAliEbrahimi Can you please clarify how many label classes are you training? For this repo, I already modified the original dataset to a new one with only 3 classes.
If you trained with different number of classes, you need to create a new cfg file, in cfg directory, and replace the num_classes path.
Hi,i am curious how to modify the original dataset to the new one with 3 classes,I would be very honored if you replied
@lmcggg @Zyjhubei
Hi, Unfortunately, I didn't modify the dataset or the code at that time, and I don't work on this project anymore. Sorry about that