dipy icon indicating copy to clipboard operation
dipy copied to clipboard

Multiprocessing the local tracking?

Open samuelstjean opened this issue 9 years ago • 21 comments

That would give it a much needed speed boost. If I recall correctly, Gab told me his version multiprocess along seeds, so pimping up the _generate_streamline function in https://github.com/nipy/dipy/blob/master/dipy/tracking/local/localtracking.py to work in parallel could be a way to do that.

samuelstjean avatar Jan 14 '16 11:01 samuelstjean

It would be good to go multi but not right now. We need to stabilize the tracking API first. There is still work to do before start talking about how to parallelize this part of the code.

Garyfallidis avatar Jan 18 '16 19:01 Garyfallidis

Still good to keep in mind, the day you add pft, that's a requirement as it takes a good 10-12h when multiprocessed to run. Still, one step at a time I guess.

samuelstjean avatar Jan 19 '16 10:01 samuelstjean

+1

fmorency avatar Oct 25 '16 21:10 fmorency

The tracking speed is starting to be a problem for us. I don't have any code or date in mind (as I'm leaving for holiday soon) but I'm starting to work on this task.

nilgoyette avatar Jul 25 '17 14:07 nilgoyette

Let's coordinate efforts. Are you talking about probabilistic tracking?

Garyfallidis avatar Jul 25 '17 14:07 Garyfallidis

I don't know your codebase much right now, but if by "probabilistic tracking" you mean LocalTracking with a ProbabilisticDirectionGetter, then yes. In fact, I was asked to || the local tracking for the 3 DirectionGetter we use.

If you mean something else, unrelated to LocalTracking, then forget what I just wrote :)

nilgoyette avatar Jul 25 '17 15:07 nilgoyette

Yes, this is what I mean. However, parallelization for this problem will not help much. The issue is that for every tracking point we have to project the spherical harmonics to ODFs. And we do that projection millions of times as we track. When I say projection this is just a matrix multiplication but the matrix is quite large. I would start from there before parallelizing the existing code. When you come back from vacation let's chat about this. Thanks again for getting involved with this.

Garyfallidis avatar Jul 25 '17 15:07 Garyfallidis

To be honest, I don't know or understand the math behind this code. If you need it, I will help you as best as I can if you explain the problem in programmer's term :) Btw, I'm only leaving in 1.5 weeks.

In the meantime, I'm pretty sure there's a simple optimization I can try. I'll keep you updated.

nilgoyette avatar Jul 25 '17 17:07 nilgoyette

@Garyfallidis Have you thought about this? I tested some simple threading (pool, process, ±cython) and I have some interesting gain but I'm curious about what you had in mind. I can throw as much core as I want in this task but there's an upper limit! A mathematical optimization would indeed be interesting.

nilgoyette avatar Sep 01 '17 19:09 nilgoyette

What I had in mind is to see if we can improve the tracking speed in probabilistic tracking by looking at the number of times the odf is calculated. There is a dot product between sph. harmonics and odfs which is repeated at each step of the tracking. That takes a lot of time. Det. tracking does not have this problem.

You can start describing here your current experience and what you have tested. @skoudoro will send a doodle to also meet in a hangout and chat.

Garyfallidis avatar Sep 03 '17 15:09 Garyfallidis

Disclaimer: I installed numpy with --no-binaries because it's impossible to do any multiprocessing with the standard version. I understand that this can be a problem but I don't know what I can do about it. JC told me that you are still unsure about what kind of multiprocessing you want in dipy so it might be interesting to discuss it before I put too much time on this.

In all my tests, I modified LocalTracking to contain a multiprocessing.Queue. The cores simply fill this queue LocalTracking returns the results when they are available in next. This is important because we don't want all the streamlines in memory. I had some difficult choices to make concerning --nb_threads because one of the core is the "main" core so it does mostly nothing/waiting.

  • == 1 Unchanged code. I still use yield
  • == 2 A bad choice because there's only one core working; the main core is waiting for of the time. It's ok only if the main thread is doing a lot of work with the results.
  • > 2 Probably what you want in most cases.

What I tested:

  1. No change to pyx file. I start N processes.
  2. No change to pyx file. Use thread pool instead.
  3. I start N processes and I cythonize as much as possible what was _generate_streamlines

1 and 3 have the problem that the cores don't finish at the same time. They share the seed evenly but some seeds require less time to process. On a 1h job, this can mean several minutes where there are less than N cores working. 2 doesn't have this problem, they all finish together, but I think a thread pool is more complex for python (with a big list on a "fast" process) so it's not actually faster.

Because of this "waiting" thread, the results are less impressing. At --nb_threads 4, I have around x3 speed. I tested only deterministic and only with small masks so I dont currently see much difference between these versions in term of speed. I still need to create a script to test the 3 versions with many datasets.

nilgoyette avatar Sep 05 '17 14:09 nilgoyette

Hey, will you mind if I cythonize probabilistic_direction_getter.py? Nothing crazy, just putting it in a pyx file and ensuring everything call be called from python and cython.

I see that you started cythonizing it (DirectionGetter, PeaksAndMetricsDirectionGetter) but it's not finished. It's the only "slow" part in my work (and it will affect the pft tracking PR too) and I think it will help.

nilgoyette avatar Sep 07 '17 17:09 nilgoyette

Sure, give it a go. This file mostly contains interfaces. You may want to cythonize just the functions that take time and not everything. How is Wednesday 11am for you? It would be great if the PFT gets integrated.

Also if anyone else wants to be at that tracking meeting. Send me an e-mail.

On Thu, Sep 7, 2017 at 1:28 PM Nil Goyette [email protected] wrote:

Hey, will you mind if I cythonize probabilistic_direction_getter.py? Nothing crazy, juste putting it in a pyx file and ensuring everything call be called from python and cython.

I see that you started cythonizing it (DirectionGetter, PeaksAndMetricsDirectionGetter) but it's not finished. It's the only "slow" part in my work (and it will affect the pft tracking PR https://github.com/gabknight/dipy/pull/3 too) and I think it will help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nipy/dipy/issues/834#issuecomment-327868736, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIMhHCjRAeHmIKnGm6muZwhLDGqXAK1ks5sgCfRgaJpZM4HEypH .

Garyfallidis avatar Sep 07 '17 18:09 Garyfallidis

Ok, I'll see what I can do.

I'm working with Gabriel on PFT. He asked me to check his cython. I'm working almost full time on || tracking and PFT so I should have something soon.

I'll be in the metting (I answered on doodle.) Just remember that I'm not a tracking expert, I'm a programmer that happens to be working on this!

nilgoyette avatar Sep 07 '17 18:09 nilgoyette

I'd be interested to attend, but Wednesday is during Miccai so I won't be able to free myself at that time.

jchoude avatar Sep 07 '17 19:09 jchoude

From @nilgoyette Doodle, it seems that Ariel is also unable to attend next week. I know that @gabknight is also attending Miccai, so if we want him to attend he will be similarly constrained. Would it be possible to move the meeting to the following week?

jchoude avatar Sep 07 '17 19:09 jchoude

👍 that would work better for me!

arokem avatar Sep 07 '17 20:09 arokem

ok for me too. I just updated the doodle and I will forward you the link @jchoude

if anyone else wants to be at that tracking meeting. Send me an e-mail.

ps: we can still have a quick meeting on multiprocessing this wednesday @nilgoyette. What do you think @Garyfallidis ?

skoudoro avatar Sep 07 '17 20:09 skoudoro

Changing probabilistic_direction_getter.py was more complex, and maybe less useful, than I thought at first, There are static member, class methods, we can't cdef the sphere and it's mostly numpy call.

I'm gonna leave it the way it is because doing it well would require more time than I can give. I do believe it's worth it though; these functions are called really frequently in a deep loop and they are in pure python.

nilgoyette avatar Sep 08 '17 15:09 nilgoyette

I wasn't able to obtain an interesting speedup on @gabknight pft code with the standard means (adding types, less calls, pointers?, etc.) I was about to give up but I tested my theory more...

I just wanted to do a quick test so I copied SHCoeffPmfGen::get_pmf and ProbabilisticDirectionGetter::get_direction in the pyx file and cythonized it. I now have a ~2x speedup. So, there's indeed something to be done.

I know I said that I was giving up in my last message but I'm more motivated now ;) Before doing anything on parallelism or pft, I'll try to submit a patch with a faster version of the DirectionGetters. I'll probably write only get_pmf and get_direction for the time being and let you cythonize what you need.

nilgoyette avatar Sep 21 '17 13:09 nilgoyette

tagging this PR #3089 which is fixing this issue.

We use cython, openmp, few seeds per thread. We are simplifying the tracking framework.

We see already huge improvement of performance. @gabknight will share some performance graph later.

Feel free to try it and give feedback.

We plan to merge this PR for the release on June.

skoudoro avatar Mar 06 '24 19:03 skoudoro