kagecom

Results 10 comments of kagecom

I also encounter this problem. How to fix it ??

> But there is no cuda 10.1, torch=1.4.0, i.e. torch=1.4.0+cu101

> I am still stuck in the training process after merge the last two commits [https://github.com/tianweiy/CenterPoint/commit/e30f768a36427029b1fa055563583aafd9b58db2](e30f768) and [https://github.com/tianweiy/CenterPoint/commit/a32fb02723011c84e500e16991b7ede43c8b5097](a32fb02). My environment is torch 1.7.0+cu101, V100-SXM2 16G.

> oh, interesting, do you get timeout error ? Because I also noticed a slightly large delay between epochs, but it does proceed after some time. > > Could you...

> When load_interval = 5, it stuck, load_interval = 1000, it woks. It confuses me.

I have tried several couples of combinations but still stuck. The only way work for me is to add env NCCL_BLOCKING_WAIT=1 to start the training process. However, it slows the...

> > > > i tried to change the load_interval from 1 to 100 just now, and seems to no stuck. i have try several ways including change the load_interval...

I have encountered the same problem, plz let me know if you can see the results

+1. And is there gt_val.bin for us to compute the segmentation metrics on val set locally like waymo open dataset v1.2?

![image](https://user-images.githubusercontent.com/13905735/158549608-76469593-7dfc-4efb-83a2-7bff34f3ceb8.png) Not every frame has segmentation labels.