Yuwen Xiong
Yuwen Xiong
If you don't know how to run horovod properly, please use the 4gpu config file. And you can ask horovod-related issue at here: https://github.com/horovod/horovod
Please make sure you are not using horovod if you haven’t setup it correctly
It seems like some environment configurations will cause this issue and the solution remains unknown, please comment L283 and L382 in upsnet/operators/src/deform_conv_kernel.cu and recompile it to suppress the log
From our experience, it won't affect the results as long as the loss decreases normally
Try using newer nvcc (e.g. 9.1 or 10.0) might solve this issue
Please try to rebuild all cython modules manually and make sure the python version matches the version you wish to run inference (i.e., use `python3.6 setup.py build_ext --inplace`)
Try to use smaller learning rate, from my understanding, too large learning rate is the most common case when L1 loss get NaN
For cityscapes, we will warp previous frames' feature into the frame which has ground truth, get multiple mIoU number and average them. For ImageNet VID, we just use normal evaluation...
That means you encountered gradient explosion (BTW, VGG might not work with FPN)
Our code require a GPU to run, please install cuda and I doubt that a VM can do this.