OpenSfM Add SIFT GPU support

This PR adds SIFT GPU feature detection and matching to improve performance drastically with GPU.

Jul 27 '20 10:07 cojosef96

Hi @cojosef96 !

Thank you for your PR, it's great to try to speed-up the OpenSfM pipeline by introducing GPU image processing. This has been requested quite sometimes by people over the years.

As it is now, the PR would add a bit of code duplication that seems not needed. It would be better if :

The GPU extractor would be just another feature type, as you already did.
Same for the matching, it would just another another matching type (that one is to add). So other feature types would benefits the GPU matching
From the above, then we wouldn't need the duplicated matching code.
There is also some debug stuff remaining (config override, ptyhon2.7 enforce, depth command override).

Let me know if you can rework this, or when we'll find some time, we can also do the changes on our side.

Yann

Jul 31 '20 18:07 YanNoun

While I appreciate GPU support for OpenSfM, I am not sure that this PR is structurally suited to be the way to go. It adds yet another dependency to an already over boarding long list of OpenSfM's dependencies. If anything the SIFT GPU extractor should be a weak dependency (hence plug-in like).

Aug 01 '20 04:08 JakeSmarter

Hey, @YanNoun Thank you for reviewing my PR. I will gladly help to change the PR to be more suitable for the OpenSfM Master Branch. The Matching functions were copied and changed because the GPU matching does not work with parallel computing. There are two ugly ways and one clever way to solve this as I see it.

Change the number of processes to 1 when GPU matching is used.
Initialize the gpu_sift_matching for every process (this takes a long time so I don't recommend it)
A more complicated but clever way is to create N sift_gpu_matching functions (N is the number of processes) and assign sift_gpu_matching function 1 to process 1 and the same with the others. Another Problem with the matching is the structure of the keywords is NumPy record array. In order to use the GPU matching on None sift_GPU descriptors, I need to create a function that transfers pos, features, color to keypoints NumPy record array. As for now, I will change the number of processes to 1 when gpu_matching is on. And I will create a function to change the p, f, c to np.recarray.
After these changes I will try to fix some debug stuff, please explain specifically what debugging stuff is needed. If there are some tasks or changes you guys want to do yourselves feel free to do so. Best Regards, Yossi

Aug 03 '20 13:08 cojosef96

Hey, @GITNE In order to run a script on GPU, we need low-level GPU programming language. Silx uses pyopencl which is a bridge between python to the OpenCL GPU programming language. If GPU usage is needed than we have to add some dependencies that can use GPU to the project. If you have a suggestion for good dependency. Please tell me and I will consider using it.

Aug 03 '20 14:08 cojosef96

In order to run a script on GPU, we need low-level GPU programming language. Silx uses pyopencl which is a bridge between python to the OpenCL GPU programming language.

I know, I am aware of that. And, I am also all for GPU support. So, this is not the point.

The point with your current implementation is that one cannot run OpenSfM unless one also has the Silx dependency, even if one configures OpenSfM to use a pure CPU or pure Python features extractor. Python's import system requires one to satisfy the silx dependency, even or especially if you do not need it.

So what I am saying is that silx should be a weak/late binding/lazy loading/lazy binding—what ever you want to call it, Python calls it programmatic importing—dependency. See Python's importlib module. Structurally, there are basically two ways to do this in Python, on which I am not going in to detail here because good examples are available on the web. One big issue with native code is that sometimes it is not readily available for all platforms. So, users on those platforms are stuck on a version of OpenSfM until these particular native code dependencies get ported to a platform the users run.

Aug 04 '20 02:08 JakeSmarter

https://github.com/mapillary/OpenSfM/pull/623/files#diff-583f2c72f32f343feda6467e39830501 This does not look right either.

Aug 04 '20 03:08 JakeSmarter

Hi guys, not sure about your timings here but as part of a GSoC project, by the end of the summer OpenCV is going to introduce back sift to the main repo after its patent expiration.

Aug 04 '20 06:08 edgarriba

@edgarriba Do you know if OpenCV's implementation is also going to have GPU support?

Aug 04 '20 09:08 JakeSmarter

Hey, I fixed the issues discussed above, please check them and see if there are more issues to fix, feel free to let me know and I will do my best to fix them.

Aug 06 '20 11:08 cojosef96

So far, looks great to me.

66168c35966a76b2d0749856610ee018b90525cb Looks like still some debug remnants. Please also squash/rebase off any debug commits because otherwise they will end up in the git index, especially the csfm.so binary, which is quite big and thus will bloat the index.

Aug 06 '20 18:08 JakeSmarter

Hey :wave: all, noticed this PR. I've also been researching ways to add GPU support to OpenSfM. So far I've only added the feature detection part, but the work I started on this repository https://github.com/uav4geo/pypopsift/ can be used to swap the SIFT feature extractor. https://github.com/pierotofy/OpenSfM/commit/00bb120a29abc27b9a94db220669afccd6b20eb0

It's a WIP and performance is not ideal though.

Aug 07 '20 01:08 pierotofy

@GITNE not for this coming release, but if there's special interest into this feature probably the dev team could take a look at it. /cc @vpisarev

Aug 07 '20 11:08 edgarriba

Hey, I fixed all the issues noted above. I hope that this PR will be helpful to you. Please let me know if there are more things needed to add/fix.

Aug 09 '20 09:08 cojosef96

Hi @cojosef96 ,

I reworked a bit your branch here : https://github.com/mapillary/OpenSfM/compare/master...feat-polish-sift-gpu More specifically, I refactored a few things :

Unified the Root-SIFT normalisation taking best-of-both-worlds (especially the zero-norm case), so we have one function
Removed the GPU-specific storage, so all features type are stored the same way : in the end, everything ends-up in a numpy array. Greatly simplify the code : no more need for the keypoin' arg/return.
Following the above, create_gpu_keypoints_from_features does proper normalisation of the descriptor
I added Lowes ratio, peak/edge threshold to be passed to Silx (and added in config.yaml), so we have consistent results between all matching types. I tested it a bit and matching results were slightly better with GPU_MATCHING due to the fact it is exact NN (and not approximate as for FLANN). Peak/edge threshold is a bit capricious in Slix, # feature can vary hugely, I might add some bucketing.
Removed the testing dataset : we want to keep the repo light.
Adding slix as requirements is still in discussion, we might not enforce it. Following that, we'll rework the documentation.

After benchmarking on a 16-inches Macbook (16 logical cores), feature extraction is 2x faster, but matching is much slower than the CPU-16-cores one.

Let me know what you think of the following changes.

Yann

Aug 10 '20 08:08 YanNoun

Hey @YanNoun , I am glad you used my PR for the OpenSFM project. The changes you made seem good to me.

I saved the keypoints format in order to save time in the loading. but one saving format is cleaner and more logical.
The berlin_gpu folder is for running the demo and not necessary.
The root_Sift normalization looks good to me.
The function create_gpu_keypoints_from_features doesn't look so efficient to me (but I didn't check ) so I used both the keypoints and the regular features form.

About the running on Mac:

The GPU implementation doesn't work efficiently when running using parallel computing. when parallel computing is done, the gpu_sift feature extractor and the gpu_sift_matching run the initializing process every time. Please check the code when using 1 process, I hope it will run faster.
As I wrote in the ReadMe, the matching process without initializing took 0.025 sec and the feature extraction without the initializing took 0.28 sec on Nvidia 1080-Ti.

This parallel problem can be fixed by demanding 1 process when using GPU functions.

I hope you will be able to merge the GPU implementation to the master repository.

Best Regards, Yossi

Aug 10 '20 14:08 cojosef96

Hi @cojosef96!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Aug 20 '20 13:08 facebook-github-bot

Hey, I sign the facebook contributor license agreement, there is the ReadMe.md conflict we should fix, my readme contains info about the GPU update which is not necessary for the master repository so you can use the basic readme. Best Regards, Yossi

Sep 01 '20 06:09 cojosef96

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Sep 01 '20 07:09 facebook-github-bot

feature extraction is 2x faster, but matching is much slower than the CPU-16-cores one.

It's worth pointing out that feature extraction using the proposed GPU method does not account for feature_min_frames (and often times the CPU implementation does need to recompute SIFT multiple times by lowering the peak threshold to achieve the desired number of keypoints).

Feature extraction is nonetheless quite fast using the GPU, if feature_min_frames is not needed.

Feature matching speed can be improved a bit if one removes the code to load the image data (check_gpu_initialization(config, im1, data)), which is unnecessary. After these changes, the GPU implementation is slightly faster on a one-to-one process comparison (using only one core), but obviously doesn't scale, whereas the CPU implementation does.

Feb 11 '21 19:02 pierotofy

Is there going to be a merge in the near future? I think many people are waiting for this to be in an official release.

Aug 25 '21 12:08 theUpsider

@cojosef96 you might consider using kornia implementation (based on pytorch) or even more powerful features like HardNet /cc @ducha-aiki

Aug 25 '21 13:08 edgarriba

Temporarily you can also check out this fork/branch https://github.com/OpenDroneMap/OpenSfM/tree/261 that has GPU support merged (mostly adapted from this PR).

Aug 25 '21 14:08 pierotofy

@pierotofy would https://github.com/OpenDroneMap/OpenSfM/tree/261 make this PR obsolete? Would it make sense to close this PR and open yours?

Sep 14 '21 20:09 dlebauer

No, 261 has several features/changes that are probably outside of the scope/general usage of OpenSfM. You would have to cherry-pick the relevant changes that pertain to GPU support and create a new branch.

Sep 14 '21 20:09 pierotofy