presto
presto copied to clipboard
`rfifind` and OpenMP
Hi @scottransom
It seems to me like rfifind
does not utilize any parallelism. I tried the command line with and without -ncpus 6
and it didn't make a difference in runtime. I still get Using 6 threads with OpenMP
when I pass -ncpus 6
.
In the source code, I see this: https://github.com/scottransom/presto/blob/master/src/rfifind.c#L121
But I don't see openMP calls or #pragma
directives anywhere else.
Am I missing something?
Thanks!
Hey Wael,
You are completely correct (unfortunately).
I added the command line options and included the appropriate headers for OpenMP many years ago in anticipation of a big parallel push. But my initial experiments with speed-ups were terrible. I think it is because I need to restructure how the code collects and merges the "birdies" after running search_fft
. The current method is hugely serial and a major anti-parallel bottleneck. In general, I would think we should be able to get rfifind
to parallelize quite nicely.
Unfortunately I haven't had the time to prioritize this work, though. So if you or a student wanted to tackle it, I think it would make a really nice computational project. A related addition I've been thinking about is adding generalized spectral kurtosis estimates...
Let me know if you have any thoughts as I'd love to move forward on this.
Scott
Hi Scott,
Thanks for the response!
I guess it will be great to have rfifind
parallelized. I probably won't have the time for it now, but maybe can find someone to take a look at this. I will let you know.
In the meantime, rfifind
runtime is quite large. I'm running it on an 8bit file, 64us / 0.5 MHz resolution, 30 mins / 1344 channels, with the arguments:
rfifind -time 30.0 -timesig 10.0 -freqsig 4.0 -chanfrac 0.5 -intfrac 0.3
Is there any command line argument I can add that removes some of the processing to make it run faster? I am searching for FRBs, so probably won't need any FFT-based masking.
-- Wael
What I usually do in situations like this is only worry about the bad channels and forget about the time-variable stuff. To do that, I would use just a small chunk of the data (maybe 5-10%?) and run rfifind
on it. Then generate an ignorechan list from the rfifind results and pass that to the various prep* commands. If your data is in one long single file, you will have to chop it somehow, but I suspect you can handle that.
The ignorechan stuff is mentioned in the tutorial and also here: https://github.com/scottransom/presto/blob/master/FAQ.md#what-is-the-difference-between-using--ignorechan-and-explicitly-including-channels-that-you-want-to-zap-in-an-rfifind-mask-using--zapchan-is-one-preferred-over-the-other
Just for reference, I should have put a link in for the FAQ entry about the lack of multi-CPU speed-up, in case others come here looking for answers: https://github.com/scottransom/presto/blob/master/FAQ.md#many-of-these-routines-are-really-slow-and-they-dont-seem-to-get-faster-using-the--ncpus-option--why-is-that