Eureka
Eureka copied to clipboard
improve the speed of the column-by-column background subtraction
Leonardo dos Santos to start work on this
I will take a look at this and see how to optimize it.
So, I did a brief test here, and I saw that a for-loop to fit ~1500 polynomials should take only a fraction of a second, so that is not the bottleneck. I think the bottleneck happens because we have a while-loop nested inside the for-loop, and a bunch of if-/try-statements to check for bad pixels. Python is sluggish with nested loops.
What I suggest is, instead of using this nested while-loop, we could simply assign weights of zero to the bad pixels when fitting the polynomials (it's the parameter w
when calling numpy.polyfit
). This should do the trick. I will work on implementing this.
This sounds great! In place of editing the existing routine, I suggest copying the code to a new routine and giving users the option to use either one. This will also allow us to compare the effectiveness of each method.
So, I have a bit of unsatisfying news. I realize that the while-loop was necessary for sigma-clipping, so it won't be trivial to remove the nested looping. I tried using Astropy's fitting modules with their implementation of sigma-clipping, and that was actually slower than the current implementation by a factor of 2-3.
I'm afraid that the code may already be at its limit for optimization, unless this background removal is implemented in a more optimized language, like C or Fortran. Or, there could be a clever way of fitting all integrations at the same time somehow, perhaps by scaling and concatenating all of the arrays together... But that would require an "open-heart surgery" in the code, particularly in how util.BGsubtraction
works.
I can push the implementation in Astropy if you want, by the way. But it does not solve the issue 😬
thanks for looking into this Leo and too bad it's proving to be a bit tricky! one naive question is, have you looked at using np.apply_along_axis() ? I think this function will apply a function of your choice along a whole axis (e.g. all the columns) and that it's coded up efficiently so that it's a lot faster than a normal for loop. That might be able to remove at least one of the loops.
If that doesn't work here I'm happy to give some pointers on how to implement a C extension
thanks for looking into this Leo and too bad it's proving to be a bit tricky! one naive question is, have you looked at using np.apply_along_axis() ? I think this function will apply a function of your choice along a whole axis (e.g. all the columns) and that it's coded up efficiently so that it's a lot faster than a normal for loop. That might be able to remove at least one of the loops.
If that doesn't work here I'm happy to give some pointers on how to implement a C extension
Oh, I didn't know about numpy.apply_along_axis()
, I will give it a try and see what happens.
Honestly, I haven't personally found the time spent on background subtraction to be prohibitive at this point. Do we still want to look into improving the runtime, or can we close this issue?
It's the slowest step within S3 and i have thoughts on how to speed it up for quick and dirty analyses, but it's also low on the priority list until we get real data.
One thing I don't think we've tried for this is using numba's jit decorator to allow for just-in-time compiled functions. I can't remember what all we use inside the relevant functions, but it's entirely possible that we could see some significant speed improvements by just adding @jit
before some key lower-level functions
Good idea! Let's also add it to our C2 proposal.