dipy
dipy copied to clipboard
Multiprocessing the local tracking?
That would give it a much needed speed boost. If I recall correctly, Gab told me his version multiprocess along seeds, so pimping up the _generate_streamline function in https://github.com/nipy/dipy/blob/master/dipy/tracking/local/localtracking.py to work in parallel could be a way to do that.
It would be good to go multi but not right now. We need to stabilize the tracking API first. There is still work to do before start talking about how to parallelize this part of the code.
Still good to keep in mind, the day you add pft, that's a requirement as it takes a good 10-12h when multiprocessed to run. Still, one step at a time I guess.
+1
The tracking speed is starting to be a problem for us. I don't have any code or date in mind (as I'm leaving for holiday soon) but I'm starting to work on this task.
Let's coordinate efforts. Are you talking about probabilistic tracking?
I don't know your codebase much right now, but if by "probabilistic tracking" you mean LocalTracking
with a ProbabilisticDirectionGetter
, then yes. In fact, I was asked to || the local tracking for the 3 DirectionGetter we use.
If you mean something else, unrelated to LocalTracking
, then forget what I just wrote :)
Yes, this is what I mean. However, parallelization for this problem will not help much. The issue is that for every tracking point we have to project the spherical harmonics to ODFs. And we do that projection millions of times as we track. When I say projection this is just a matrix multiplication but the matrix is quite large. I would start from there before parallelizing the existing code. When you come back from vacation let's chat about this. Thanks again for getting involved with this.
To be honest, I don't know or understand the math behind this code. If you need it, I will help you as best as I can if you explain the problem in programmer's term :) Btw, I'm only leaving in 1.5 weeks.
In the meantime, I'm pretty sure there's a simple optimization I can try. I'll keep you updated.
@Garyfallidis Have you thought about this? I tested some simple threading (pool, process, ±cython) and I have some interesting gain but I'm curious about what you had in mind. I can throw as much core as I want in this task but there's an upper limit! A mathematical optimization would indeed be interesting.
What I had in mind is to see if we can improve the tracking speed in probabilistic tracking by looking at the number of times the odf is calculated. There is a dot product between sph. harmonics and odfs which is repeated at each step of the tracking. That takes a lot of time. Det. tracking does not have this problem.
You can start describing here your current experience and what you have tested. @skoudoro will send a doodle to also meet in a hangout and chat.
Disclaimer: I installed numpy with --no-binaries because it's impossible to do any multiprocessing with the standard version. I understand that this can be a problem but I don't know what I can do about it. JC told me that you are still unsure about what kind of multiprocessing you want in dipy so it might be interesting to discuss it before I put too much time on this.
In all my tests, I modified LocalTracking
to contain a multiprocessing.Queue
. The cores simply fill this queue LocalTracking
returns the results when they are available in next
. This is important because we don't want all the streamlines in memory. I had some difficult choices to make concerning --nb_threads
because one of the core is the "main" core so it does mostly nothing/waiting.
-
== 1
Unchanged code. I still use yield -
== 2
A bad choice because there's only one core working; the main core is waiting for of the time. It's ok only if the main thread is doing a lot of work with the results. -
> 2
Probably what you want in most cases.
What I tested:
- No change to pyx file. I start N processes.
- No change to pyx file. Use thread pool instead.
- I start N processes and I cythonize as much as possible what was
_generate_streamlines
1 and 3 have the problem that the cores don't finish at the same time. They share the seed evenly but some seeds require less time to process. On a 1h job, this can mean several minutes where there are less than N cores working. 2 doesn't have this problem, they all finish together, but I think a thread pool is more complex for python (with a big list on a "fast" process) so it's not actually faster.
Because of this "waiting" thread, the results are less impressing. At --nb_threads 4
, I have around x3 speed. I tested only deterministic and only with small masks so I dont currently see much difference between these versions in term of speed. I still need to create a script to test the 3 versions with many datasets.
Hey, will you mind if I cythonize probabilistic_direction_getter.py
? Nothing crazy, just putting it in a pyx file and ensuring everything call be called from python and cython.
I see that you started cythonizing it (DirectionGetter
, PeaksAndMetricsDirectionGetter
) but it's not finished. It's the only "slow" part in my work (and it will affect the pft tracking PR too) and I think it will help.
Sure, give it a go. This file mostly contains interfaces. You may want to cythonize just the functions that take time and not everything. How is Wednesday 11am for you? It would be great if the PFT gets integrated.
Also if anyone else wants to be at that tracking meeting. Send me an e-mail.
On Thu, Sep 7, 2017 at 1:28 PM Nil Goyette [email protected] wrote:
Hey, will you mind if I cythonize probabilistic_direction_getter.py? Nothing crazy, juste putting it in a pyx file and ensuring everything call be called from python and cython.
I see that you started cythonizing it (DirectionGetter, PeaksAndMetricsDirectionGetter) but it's not finished. It's the only "slow" part in my work (and it will affect the pft tracking PR https://github.com/gabknight/dipy/pull/3 too) and I think it will help.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nipy/dipy/issues/834#issuecomment-327868736, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIMhHCjRAeHmIKnGm6muZwhLDGqXAK1ks5sgCfRgaJpZM4HEypH .
Ok, I'll see what I can do.
I'm working with Gabriel on PFT. He asked me to check his cython. I'm working almost full time on || tracking and PFT so I should have something soon.
I'll be in the metting (I answered on doodle.) Just remember that I'm not a tracking expert, I'm a programmer that happens to be working on this!
I'd be interested to attend, but Wednesday is during Miccai so I won't be able to free myself at that time.
From @nilgoyette Doodle, it seems that Ariel is also unable to attend next week. I know that @gabknight is also attending Miccai, so if we want him to attend he will be similarly constrained. Would it be possible to move the meeting to the following week?
👍 that would work better for me!
ok for me too. I just updated the doodle and I will forward you the link @jchoude
if anyone else wants to be at that tracking meeting. Send me an e-mail.
ps: we can still have a quick meeting on multiprocessing this wednesday @nilgoyette. What do you think @Garyfallidis ?
Changing probabilistic_direction_getter.py
was more complex, and maybe less useful, than I thought at first, There are static member, class methods, we can't cdef the sphere and it's mostly numpy call.
I'm gonna leave it the way it is because doing it well would require more time than I can give. I do believe it's worth it though; these functions are called really frequently in a deep loop and they are in pure python.
I wasn't able to obtain an interesting speedup on @gabknight pft code with the standard means (adding types, less calls, pointers?, etc.) I was about to give up but I tested my theory more...
I just wanted to do a quick test so I copied SHCoeffPmfGen::get_pmf
and ProbabilisticDirectionGetter::get_direction
in the pyx file and cythonized it. I now have a ~2x speedup. So, there's indeed something to be done.
I know I said that I was giving up in my last message but I'm more motivated now ;) Before doing anything on parallelism or pft, I'll try to submit a patch with a faster version of the DirectionGetters
. I'll probably write only get_pmf
and get_direction
for the time being and let you cythonize what you need.
tagging this PR #3089 which is fixing this issue.
We use cython, openmp, few seeds per thread. We are simplifying the tracking framework.
We see already huge improvement of performance. @gabknight will share some performance graph later.
Feel free to try it and give feedback.
We plan to merge this PR for the release on June.