Meshroom icon indicating copy to clipboard operation
Meshroom copied to clipboard

Why not use neural networks for create depth map?

Open Dok11 opened this issue 6 years ago • 43 comments

Something like this https://github.com/ialhashim/DenseDepth#results or this https://github.com/gautam678/Pix2Depth

Using this technlogy mey help to avoid some bugs on the flat/mirror/shiny areas. And using neural network will be faster than the current approach. I wanted to see how DepthMap node works at this moment, but cant find code exept node description in meshroom/nodes/aliceVision/

Dok11 avatar Jun 29 '19 11:06 Dok11

I think it would be a good way to collect alternative methods and be able to implement then in new nodes with compatible in and output. For example I starred https://github.com/AIBluefisher/EGSfM but I forgot if it came from here or alicevision. This might also be an alternative method.

Baasje85 avatar Jun 29 '19 12:06 Baasje85

The problem is not finding alternatives: https://github.com/timzhang642/3D-Machine-Learning https://github.com/natowi/3D-Reconstruction-with-Neural-Network

GeoDesc replace SIFT https://groups.google.com/forum/#!topic/alicevision/HQhqtJjGaQ0

The reconstruction system, named i23dMVS, ranks in the top 10 in tanks and temples dataset. Here the link to the project: https://github.com/AIBluefisher/GraphSfM Since GraphSfM is partially based on (an early) version of OpenMVG and licensed under BSD 3-Clause, I think it should be possible to include this approach in Meshroom to accelerate large scale reconstructions https://groups.google.com/forum/#!topic/alicevision/_5Eo6hqLBS8

If you are interested in implementing a Machine Learning approach, you are welcome to contribute to Alicevision.

Here is a similar project

@Dok11 DepthMap is used in src/software/pipeline/main_depthMapEstimation.cpp, src/software/pipeline/main_depthMapFiltering.cpp. The includes are <aliceVision/depthMap/RefineRc.hpp> and <aliceVision/depthMap/SemiGlobalMatchingRc.hpp> From https://github.com/alicevision/AliceVision/issues/439#issuecomment-481957038

natowi avatar Jun 29 '19 17:06 natowi

@natowi could we maybe add this knowledge to https://github.com/alicevision/meshroom/wiki/Good-first-contributions ?

Baasje85 avatar Jun 30 '19 08:06 Baasje85

https://github.com/timzhang642/3D-Machine-Learning

Wow, its amazing. Nice topic!

If you are interested in implementing a Machine Learning approach, you are welcome to contribute to Alicevision.

It may be interest challenge. If I will realize how compilation program and related actions to contribute :)

DepthMap is used in src/software/pipeline/main_depthMapEstimation.cpp

I didnt found this path in repo. May be you can provide link?

Dok11 avatar Jun 30 '19 10:06 Dok11

@Dok11 https://github.com/alicevision/AliceVision/blob/develop/src/software/pipeline/main_depthMapEstimation.cpp

Baasje85 avatar Jun 30 '19 11:06 Baasje85

@Dok11 also https://github.com/alicevision/meshroom/issues/520#issuecomment-504487628

natowi avatar Jun 30 '19 11:06 natowi

This is way over my skill level, but I am interested. I have a photo booth, 360 degrees, cameras and lighting facing inward. Light flare/reflection is extremely difficult to manage....I'm wondering if this type of approach would make life easier for me

hargrovecompany avatar Jul 03 '19 06:07 hargrovecompany

I'm wondering if this type of approach would make life easier for me

It will work and solve your problem, but I cant say when this workflow with neural networks will be able in usual software like meshroom. As I know currently only Pix4D using neural networks in their software and result of work not best from all alternatives

Dok11 avatar Jul 03 '19 09:07 Dok11

@Dok11 I found a script for Metashape (Photoscan) https://github.com/agisoft-llc/metashape-scripts/blob/master/src/model_style_transfer.py and tensorflow... But this is for model style transfer (texture) and not 3d reconstruction

natowi avatar Jul 03 '19 09:07 natowi

I still save interest for this topic but without Meshroom. For me more easy try this technology directly in Blender — easist way to write code and see result without require to build whole application. So now I studying how photogrammetry works complete with whole pipeline from detect camera position to meshing. And as I realize almost that all feature can be (and better) implemented by neural networks instead classic algorithm.

Maybe I will still motivation and will do something real from this words :)

@Dok11 I found a script for Metashape (Photoscan) https://github.com/agisoft-llc/metashape-scripts/blob/master/src/model_style_transfer.py and tensorflow... But it looks like it is only being used for texturing.

I see using style transfer (one of type on neural networks). Maybe it provide ability to avoid glithces on textures? But is just assumption

Dok11 avatar Jul 03 '19 09:07 Dok11

@natowi @Baasje85 I guess I need your advice.

Currently I develop network which can potential predict camera positions. I learned papers from arxiv and based on them create neural network what get two images and returns 7 numbers where first four is wxyz quaternion rotation camera from one image relative to camera second image. This pretty work as I can see.

But second three numbers currently is xyz delta positions between to images. It also good works on the accuracy metrics but I sure what is incorrect and not stable result. camera_deltas__pos-xyz

Now I'll some describe for more clarity: For trainig neural netwok I made many many syntetic images from blender demo scenes and get dimensions from these scenes. This data pass to neural network as trainable data. image image image

But now imagine what will be if I'll scale whole scene up to 10 times? All dimensions size will be increased BUT content in images will not change and still as was before.

So what I want. Maybe you can suggest some method to describe camera movement in 3d space without absolute variables in scene dimensions?

Dok11 avatar Sep 06 '19 22:09 Dok11

Thanks @Baasje85 pointing me here. @Dok11 have you considered using parametric constraints first? For example by angle, percentage, projection, scale. If all your project images files are resolved you could resolve the parametric constraints (maybe even including ground controlpoint registration) towards absolute positions. Then use the current pipeline to do the other tricks with respect to alignment.

skinkie avatar Sep 07 '19 10:09 skinkie

@skinkie thanks for your participian! Seems like you right but for my opinion any constraint make task not so universal. I searched decision based on camera degrees but here people describe me what it is impossible :) well okay https://math.stackexchange.com/questions/3350091/is-would-possible-to-describe-length-just-in-degrees image

In time decision searching I got a couple of ideas to make neural network more robustness. Maybe in next week I share my results and decision details, if I successed.

And more. For more accurate distance result from picture to picture I think need to use pipeline with two neural networks: The first NN get two images with known FOV and give result with delta of rotation degrees and delta of distance between cameras in world units. Second neural network get four images: target, before and first and second what was given to first neural network with delta data from that NN. Then returns own result with usage all known for them data. If you have some thoughts about it I'll be glad to got them.

Dok11 avatar Sep 10 '19 08:09 Dok11

Bees track their position over time by the the angle towards the moving sun, a distance metric based on the rocking of their wings and a local heuristic based on smell. I would completely agree that using a single constraint doesn't work similar to using a single x or a single y. But this is not the case with parametric constraints, you can apply multiple (even controversial) constraints based on angle and distance (offset), all being relative towards each other. Your approach using multiple steps is advisable because an other reason: even with an unknown FOV you could infer a FOV in relative units which is a new metric. I do wonder if it wouldn't be better to do so with the existing approaches opposed to neural networks. As I wrote in another ticket even information such as time and order of the photo might give you significant clues regarding the search space.

skinkie avatar Sep 10 '19 09:09 skinkie

Bees track their position over time by the the angle towards the moving sun, a distance metric based on the rocking of their wings and a local heuristic based on smell.

And they (bees or other insects) can make mistakes with light bulb. I want say that this constraint not enough universal.

even with an unknown FOV

I think it rary case. Usually we can extract FOV from metadata of photos. Or we can create new neural network. It may be pretty simple.

I do wonder if it wouldn't be better to do so with the existing approaches opposed to neural networks. As I wrote in another ticket even information such as time and order of the photo might give you significant clues regarding the search space.

Neural network while trainig potentially can learn extract right and correct features from images for doing more accurate camera positions than classic algorightms. You can go to forums or issues tracker to see how often people have fails with 3d reconstruction for some noise/motion blur, or photos with too large different angle between camera positions and event mirrors or shine surafaces images. Not so far ago I did photogrammetry for the large indoor scene. In this scene was not have enough lightness, so images was pretty noise but any human can say where did any photos relative to other. But not for Meshroom or other photogrammetry software based on classic algorightms. So for my scene I lost around 500-700 images of 1300. Offensively. But I'm shure what neural netowrk may do it better. And I'll try to do it. My first prupose is making node for meshroom what can replace this node group: image

It may take many month. And may be impossible for me =) But currently I have pretty positive results.

Dok11 avatar Sep 16 '19 19:09 Dok11

And they (bees or other insects) can make mistakes with light bulb. I want say that this constraint not enough universal.

Heat and polarized light (or in the case of bees: events of green) can confuse the orientation. But this goes for any parameterized situation: create ambiguity and it will try to find another optimum.

I do wonder if it wouldn't be better to do so with the existing approaches opposed to neural networks. As I wrote in another ticket even information such as time and order of the photo might give you significant clues regarding the search space.

Neural network while trainig potentially can learn extract right and correct features from images for doing more accurate camera positions than classic algorightms.

I agree on this fully. But I actually mean using the classic approach for camera calibration.

Not so far ago I did photogrammetry for the large indoor scene. In this scene was not have enough lightness, so images was pretty noise but any human can say where did any photos relative to other. But not for Meshroom or other photogrammetry software based on classic algorightms. So for my scene I lost around 500-700 images of 1300. Offensively. But I'm shure what neural netowrk may do it better. And I'll try to do it. My first prupose is making node for meshroom what can replace this node group: image

You want to replace the entire node group with one black box? Why not start with providing an alternative FeatureExtraction - FeatureMatching approach? Returning camera orientation is something that is by itself already valuable. And you might be able to use it as an ensemble.

It may take many month. And may be impossible for me =) But currently I have pretty positive results.

Good luck. I'll obviously support it. For me, I want to go in the completely opposite direction of supervised learning. Allow the user to assist meshroom in getting better results.

skinkie avatar Sep 16 '19 20:09 skinkie

You want to replace the entire node group with one black box?

No. I have plan to make several neuralnetworks, it includes:

  1. NN for extract camera FOV value (I hope it will not need) Not working with it absolutely. But most likely it not too hard.

  2. NN for search most nearest images in space or estimate sequence of captures Have some interest ideas and found good papers from arxiv.org for this prupose. Maybe soon will try it. Starter dataset for this purpose almost ready.

  3. NN for estimate camera positions (in fact it will be two neuralnetworks) Most effective NN what I have. And most time I spent for this task.

  4. NN for generate depth maps or point cloud from any camera. Dataset almost ready, but NN just in demo mode. Need for hard training and power GPU.

Why not start with providing an alternative FeatureExtraction - FeatureMatching approach? Returning camera orientation is something that is by itself already valuable. And you might be able to use it as an ensemble.

I thought estimation camera position dont cross with FeatureExtraction tasks. I thought these nodes generate low res point cloud for next nodes and these nodes hard connective together. But if my NN for estimate camera positions in scene can replace these nodes it will perfect! Because currently in Meshroom them taken much time but NN can do it very fast (maybe around several seconds per 100 images) on GPU.

Dok11 avatar Sep 16 '19 20:09 Dok11

  1. NN for search most nearest images in space or estimate sequence of captures Have some interest ideas and found good papers from arxiv.org for this prupose. Maybe soon will try it. Starter dataset for this purpose almost ready.

@Baasje85 and myself have a huge dataset available created by multiple camera's, some even geotagged. If you are interested in using it, we can obviously provide it.

Why not start with providing an alternative FeatureExtraction - FeatureMatching approach? Returning camera orientation is something that is by itself already valuable. And you might be able to use it as an ensemble.

I thought estimation camera position dont cross with FeatureExtraction tasks. I thought these nodes generate low res point cloud for next nodes and these nodes hard connective together.

This may be true for meshroom (and its depthmap), but not for other approaches like openMVG/openMVS in which a high density point cloud seems to be the thing used.

But if my NN for estimate camera positions in scene can replace these nodes it will perfect! Because currently in Meshroom them taken much time but NN can do it very fast (maybe around several seconds per 100 images) on GPU.

Would obviously be cool :-)

skinkie avatar Sep 16 '19 21:09 skinkie

@Baasje85 and myself have a huge dataset available created by multiple camera's, some even geotagged. If you are interested in using it, we can obviously provide it.

Sounds good! Can you show several examles of it? It can be helpful. But how this dataset marked? For example two photos with same coords can be not cross because did for different direction top/bottom or left/right. And one point. Training the neural netowk for this tasks can be do without "teacher". NN learn extract some key features from images and compare how many of them is equals. As I know Face ID or other image comaprison approaches use same technique.

image

image

This may be true for meshroom (and its depthmap), but not for other approaches like openMVG/openMVS in which a high density point cloud seems to be the thing used.

I did not come across this. Maybe do it for Meshroom will be more understandable task. In fact I dont very nice understand how photogrammetry pipline work in every detail :)

Dok11 avatar Sep 16 '19 22:09 Dok11

@natowi @Baasje85 maybe you can say. Is would be useful for Meshroom the neural netowork what can repair camera positions in scene? Or SfM also while working retrieves sparse point cloud?

I think about minimal valuable node to start integation neural network into Meshroom.

Dok11 avatar Oct 07 '19 21:10 Dok11

@Dok11 with repair you mean something analogue to LocalBundle adjustment?

skinkie avatar Oct 07 '19 22:10 skinkie

I did't seen this term before but seems like you right. As I wrote before my currently NN can define camera positions in scene by image (transition and rotation). I dont know how much better (or worse) NN works in comparison with classic SfM at current state. So I want test it in real tasks as soon as possible, while I have motivation to this task :D

In sum I want know minamal set of skills of NN which can be helpful to Meshroom.

Dok11 avatar Oct 07 '19 22:10 Dok11

If you can return the intrinsics for the camera's I am sure that would significantly reduce the computing effort to find them.

skinkie avatar Oct 07 '19 22:10 skinkie

If you can return the intrinsics for the camera's I am sure that would significantly reduce the computing effort to find them.

Well I just started from preparation dataset =) image

Coming soon! Not, it's not so fast as I want :(

Dok11 avatar Nov 18 '19 11:11 Dok11

@skinkie @natowi Hello =)

I am at final step to provide MVP of NN which can predict intersection of two photos. We can give to NN the two images and get number how much common surfaces on second image from first. image

As I realized this feature can save time in task to find common features in each images. This is true?

Dok11 avatar Dec 26 '19 08:12 Dok11

@Dok11 https://arxiv.org/abs/1506.06825 Seen that one?

skinkie avatar Jan 19 '20 12:01 skinkie

@skinkie not this one, but every year on arxiv appear several works in this theme. For example more newest article https://arxiv.org/abs/2001.05036 (january 2020), I have seen that but in common way it about either creating only depth maps or generating new points of view. It may be useful in future. Currently I focused on tasks with estimate camera position as first steps of photogrammetry process. And estimating depth maps is highly depend from hardware performance, so I think I need up my skill in optimization networks on more simple tasks at start.

And I was mistake when think what including neural network in the process is easy. Not rocket science of course but still.

Neural network which about I wrote above is not ready but will be useful for neural network which will camera pose estimation. Release day postponed xD Currently I write article about this and made video with vizualization of dataset generator if you will be interested =) https://www.youtube.com/watch?v=6bec2NmpFOc

Dok11 avatar Jan 19 '20 13:01 Dok11

"DeepV2D: Video to Depth with Differentiable Structure from Motion" https://arxiv.org/abs/1812.04605 https://github.com/princeton-vl/DeepV2D

natowi avatar Jan 26 '20 15:01 natowi

Hello everyone, as suggested at the top of this issue, I start to implement Dense depth as Meshroom node( codebase). The idea is to change the DenseDepth block, and I have the code to obtaine dense map from a neural network, but I'm stopped because I must write a .exr file. I find this but how I must write this file?

(I'm new in this project so I do a class plugin that call by command the python code, I think that it can be included into the standard class of Meshroom but I do not know enough, the dependencies are torch and numpy for now, if someone have advice help me also for this point)

Thank you in advance!

nicolalandro avatar Jul 03 '21 08:07 nicolalandro

@nicolalandro that´s great!

I found a few OpenExr examples that may be useful for you: https://github.com/mlagunas/pytorch-nptransforms/blob/master/exr_data.py https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/io/exr.py PIL2exr https://stackoverflow.com/questions/65605761/write-pil-image-to-exr-using-openexr-in-python

There is aliceVision_utils_imageProcessing (called by https://github.com/alicevision/meshroom/blob/develop/meshroom/nodes/aliceVision/ImageProcessing.py) which can convert images to exr, but I don´t know how useful this is in your case. Meshroom does use OIIO for image processing and even has its own oiio plugin for qt https://github.com/alicevision/QtOIIO/.

natowi avatar Jul 03 '21 13:07 natowi