simrdwn icon indicating copy to clipboard operation
simrdwn copied to clipboard

Trained yolt stopped working?!

Open admarrs opened this issue 5 years ago • 5 comments

I'm scratching my head with this one. Back in March I had successfully trained Yolt using the COWC data and got some good test results on a separate data set.

Coming back a month later, I've tried to re-run the same config and can't get the same results! Probabilites are very low <0.01. The only thing that changed was a swap out of the Graphics card to upgrade to a Titan. Could this make a difference?

I was wondering if this was in anyway related to #26

admarrs avatar Apr 08 '19 10:04 admarrs

Some additional info. running on the COWC test data I get the same low probability result but if I set the threshold to 0.01 I see the following, all the "detections" seem to be in rows at the bottom of each 544 pixel slice.

image

admarrs avatar Apr 08 '19 10:04 admarrs

I had experienced a similar problem. I checked line 9-14 in /simrdwn/yolt/Makefile. My GPU did not match with any of them, so I added one. Also, I changed the version of CUDA and TensorFlow in /simrdwn/docker/Dockerfile, and I reinstalled SIMRDWN. You can find more information on "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/".

h10public avatar Apr 12 '19 08:04 h10public

@ghghgh777 I ran into a similar problem and your answer is very helpful! Can you share more details on how you modified the /simrdwn/docker/Dockerfile? Thanks!

wendyzzzw avatar May 30 '19 06:05 wendyzzzw

@wendyzzzw Sorry for late reply. Well, I checked my answer only with the commit b275a35, so it may not work for the current commit.

Check "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/". If you are using GPU matched with SM62, SM70, or SM75, you need to add a line like "-gencode arch=compute_62,code=[sm_62,compute_62]" after line 9-13 in /simrdwn/yolt2/Makefile and /simrdwn/yolt3/Makefile.

For /simrdwn/docker/Dockerfile, the code was updated, so it uses CUDA 9.0. I think CUDA 10 would be required if your GPU is matched with SM75. If you need to use CUDA 10, you can change line 2 and line 25-26. The current version of SIMRDWN uses tensorflow-gpu 1.13.1, so I think it would be OK.

After that, I reinstalled SIMRDWN from "0-3. Build docker file".

h10public avatar Jun 06 '19 04:06 h10public

As noted by @ghghgh777, this seems to be a gpu architecture issue, and has been observed in YOLO as well: https://github.com/pjreddie/darknet/issues/486. I'm still digging into the issue, but it seems that there may be a compatibility issue with weights trained on older versions of CUDA. As painful as it seems, retraining the model with the new hardware/drivers worked for me to get around this issue.

avanetten avatar Jun 14 '19 00:06 avanetten