heat icon indicating copy to clipboard operation
heat copied to clipboard

If the dataset has large average number of corners

Open zssjh opened this issue 2 years ago • 5 comments

Thank you very much for your open source code. I build floorplan on structured3d with very good results! Now, I run this algorithm on my own complex dataset, which has average/maximum numbers of corners 300/800(structured3d is 22/52), so the algorithm will use huge GPU memory and run failed,maybe because the initial number of edges is set to the square of the number of corners(O(N^2))? So do you have any good solutions? Looking forward to your reply!

zssjh avatar Jun 22 '22 02:06 zssjh

In the current implementation, the number of edges after edge filtering is set to be (3 * N) instead of O(N^2), where N is the number of corners (check here and the corresponding descriptions in the paper's Sec.4.2). We don't use O(N^2) because 1) most edge candidates are easy negatives and can be eliminated with independent processing (i.e., the edge filtering part), and feeding all edges into transformer decoder is a waste; 2) keeping all the edge candidates makes the computational cost of transformer decoders unaffordable.

In your case, I guess the GPU memory is used up even before the edge filtering part, as you have too many corners. A potential solution would be: 1) run the edge filtering on all O(N^2) edge candidates in an iterative manner and eliminate all easy negatives; 2) try to run the edge transformer decoder part with the remaining edge candidates. But running a transformer decoder with over 2000 input nodes is still computationally expensive, and you still need many GPU resources.

Another workaround could be splitting each big scene into multiple sub-divisions and running the algorithm on each part separately. As your scene is so huge, the relation between far areas might be weak, and this division might not hurt the performance significantly.

Hope this will help :)

woodfrog avatar Jun 22 '22 04:06 woodfrog

In the current implementation, the number of edges after edge filtering is set to be (3 * N) instead of O(N^2), where N is the number of corners (check here and the corresponding descriptions in the paper's Sec.4.2). We don't use O(N^2) because 1) most edge candidates are easy negatives and can be eliminated with independent processing (i.e., the edge filtering part), and feeding all edges into transformer decoder is a waste; 2) keeping all the edge candidates makes the computational cost of transformer decoders unaffordable.

In your case, I guess the GPU memory is used up even before the edge filtering part, as you have too many corners. A potential solution would be: 1) run the edge filtering on all O(N^2) edge candidates in an iterative manner and eliminate all easy negatives; 2) try to run the edge transformer decoder part with the remaining edge candidates. But running a transformer decoder with over 2000 input nodes is still computationally expensive, and you still need many GPU resources.

Another workaround could be splitting each big scene into multiple sub-divisions and running the algorithm on each part separately. As your scene is so huge, the relation between far areas might be weak, and this division might not hurt the performance significantly.

Hope this will help :)

Thank you, It's very helpful. I'll try it!

zssjh avatar Jun 22 '22 06:06 zssjh

Hello, @woodfrog , still a problem about this dataset, our dataset is about 120 images, I augment 10 times to get about 1000 inputs, but the training is still over fitting at about epoch 50, so the network learning nothing now. So for small dataset, which part of the network can I remove or simplify that has relative little impact on accuracy? Or do you have other suggestions? Thank you very much!

zssjh avatar Jun 27 '22 11:06 zssjh

Hello, @woodfrog , still a problem about this dataset, our dataset is about 120 images, I augment 10 times to get about 1000 inputs, but the training is still over fitting at about epoch 50, so the network learning nothing now. So for small dataset, which part of the network can I remove or simplify that has relative little impact on accuracy? Or do you have other suggestions? Thank you very much!

Hi @zssjh, according to your previous description, your dataset seems to contain quite large-scale scenes, so I don't think 120 of such scenes would lead to very serious overfitting. Could you elaborate on what you observed for "so the network learns nothing now"? If you try to do a test on the training images, would the results be perfect? If this is the case, then data augmentation should be the right way to go -- what is your current augmentation strategy to get the 10 augmented copies?

woodfrog avatar Jun 29 '22 21:06 woodfrog

Hi, @woodfrog Thank you for your reply! I train about 300 epochs, at about 20 epoch, the val loss began to rise until the end, including corner loss, s1 edge loss, image decorator loss. Only geometry loss did not rise, but remained unchanged from the 150th epochs, so I judged this situation as over fitting. According to your suggestion, I tested the best checkpoint (from 144 epoch) on the testset, and I found that the network seemed to have learned some rules, About 40% of edges and corners can be detected correctly, but it seems that the over fitting prevents the network from learning better.

zssjh avatar Jul 05 '22 06:07 zssjh