Johan Edstedt
Johan Edstedt
This seems to happen during the benchmark. You should be able to see the names of the images being sent in. If so I can check if Im able to...
If its the megadepth training set we don't use seeds so it might be different image pairs etc. You might reduce the risk of this happening by increasing the diagonal...
This works for me on cuda 11.7 with pytorch 2.0.1 as well.
It's fine for me that it's a bit inefficient, but the cluster I'm running on automatically kills jobs that go below 25% power, which is quite frustrating, so I'd like...
I got stuff to run faster by upping THREADS_PER_BLOCK from 32 -> 96 (64 also works fine), on my fork. This gave about 2x speedup for me. However, it's still...
@ahojnnes would this be similar to what I said regarding multiple processes in parallel on the same GPU? Because I had almost no success in speeding up with that approach....
 Since SweepFromTopToBottom takes up basically entire computation time it's difficult to tell haha. Perhaps I can split up the kernel for the purpose of debugging? Actually: I used the...
> The algorithm imposes that each row or column of an image uses one cuda thread. Depending on the size of your images, there will be many more cores than...
@ahojnnes I'm not sure I understand why THREADS_PER_BLOCK (https://github.com/colmap/colmap/blob/main/src/colmap/mvs/patch_match_cuda.cu#L45) has to be exactly 32? It seems to go into a lot of different places, but there does not seem to...
@ahojnnes thanks. I'll report back if Im able to figure something out.