ToothGroupNetwork when I was training I got the following error

Hello:

I'm using the main branch an follow the procedure:

use challenger dataset to process, 2.use processing data to train 3.train command: python start_train.py --model_name tgnet_fps --config_path train_configs/tgnet_fps.py --experiment_name tgnet --input_data_dir_path processDir --train_data_split_txt_path split/base_name_train_fold.txt --val_data_split_txt_path split/base_name_val_fold.txt However, the following error occure

When I comment out the code ‘raise’： the following error occurre 39d08ca81aeab5de92173beaf461f0a

My running environment： The graphics card is RTX 3090 Python 3.7.16 torch: 1.13.1+cu117, cuda 11.7,cudnn 8500 Ubuntu 20.4

Now I don't know where the problem is, Can you help me?

Jul 10 '23 02:07 caogj-0521

maybe this issue is related to https://github.com/limhoyeon/ToothGroupNetwork/issues/4. there are issues with CUDA functions not executing properly in some PC environments. I plan to remove the CUDA functions and replace them with Python code. Please bear with me a little longer, or alternatively, you can try reinstalling the 'pointops' library from https://github.com/POSTECH-CVLab/point-transformer/tree/master/lib/pointops.

Jul 24 '23 02:07 limhoyeon

Hello:

I'm using the main branch an follow the procedure:

use challenger dataset to process, 2.use processing data to train 3.train command: python start_train.py --model_name tgnet_fps --config_path train_configs/tgnet_fps.py --experiment_name tgnet --input_data_dir_path processDir --train_data_split_txt_path split/base_name_train_fold.txt --val_data_split_txt_path split/base_name_val_fold.txt However, the following error occure

When I comment out the code ‘raise’： the following error occurre

My running environment： The graphics card is RTX 3090 Python 3.7.16 torch: 1.13.1+cu117, cuda 11.7,cudnn 8500 Ubuntu 20.4

Now I don't know where the problem is, Can you help me?

Hello, have you solved this problem? I encountered the same problem as you. My running environment： The graphics card is RTX 4090 Python 3.6.13 pytorch1.10 cuda11.7 cudnn Ubuntu 20.4 Hope your reply. :)

Feb 02 '24 11:02 supgy

Did you tried to replace the point ops with "https://github.com/POSTECH-CVLab/point-transformer/tree/master/lib/pointops"??

Apr 12 '24 05:04 limhoyeon

Hello, have you solved this problem? I encountered the same problem as you. My running environment： The graphics card is RTX 4060 Python 3.8 pytorch2.1 cuda11.8 cudnn Windows

Hello:

I'm using the main branch an follow the procedure:

use challenger dataset to process, 2.use processing data to train 3.train command: python start_train.py --model_name tgnet_fps --config_path train_configs/tgnet_fps.py --experiment_name tgnet --input_data_dir_path processDir --train_data_split_txt_path split/base_name_train_fold.txt --val_data_split_txt_path split/base_name_val_fold.txt However, the following error occure

When I comment out the code ‘raise’： the following error occurre

My running environment： The graphics card is RTX 3090 Python 3.7.16 torch: 1.13.1+cu117, cuda 11.7,cudnn 8500 Ubuntu 20.4

Now I don't know where the problem is, Can you help me?

Nov 23 '24 08:11 FengZhongLiuDong

In my case, the issue arises from this line. I did not explicitly label the jaws as either upper or lower. Consequently, if I mistakenly name the upper jaw as "lower" extension and pass it to this function, it results in an error while training.

Nov 25 '24 07:11 ptpam

Thank you very much for your reply. I must say that this is an excellent piece of work, but the issue still persists during training. First, I can rule out any environment-related problems, as I rented a server on a cloud computing platform with the same configuration as yours, but the same issue still occurs. Then, I tried recompiling PointOps (using the official code), but unfortunately, that didn't solve the problem either. I believe the best way to solve this issue is to reproduce the error. If you have some free time, I would appreciate it if you could run the code from the GitHub repository again. I am also in the process of locating and fixing the error step by step and would appreciate your assistance.

就我而言，问题就来自这一行。我没有明确地将下颌标记为上颌或下颌。因此，如果我错误地将上颌命名为 “下” 伸展并将其传递给这个函数，则会导致训练时出错。

Nov 25 '24 12:11 FengZhongLiuDong

I made a very simple mistake. When the author reminded me to use the official PointOps library, I got lazy and kept the data processed with the previous version of PointOps, directly using it for training. However, when I debugged and pinpointed the label issue, I suspected that there might have been a problem with the data processing. When reviewing the problematic module of the project again, I discovered issue #7 and confirmed my suspicion (https://github.com/limhoyeon/ToothGroupNetwork/issues/7). The data processed with the current version of the PointOps library only contains one label, which caused the neighborhood index extraction to fail during training. This is an outstanding piece of work, and I will study it carefully and thoroughly. Once again, I sincerely appreciate the author's help and am deeply grateful

Nov 25 '24 16:11 FengZhongLiuDong