SoFiA icon indicating copy to clipboard operation
SoFiA copied to clipboard

Busy Function fitting broken

Open SoFiA-Admin opened this issue 4 years ago • 3 comments

The current implementation of Busy Function fitting does not appear to work correctly in all cases, causing some additional problems down the line, such as catalogue formatting issues. The reason for the problem is the incomplete transition from the old GSL-based algorithm developed by @SoFiA-Admin to the new stand-alone algorithm written by @RussellJurek (see bb515b84c1cb288e9bbcd4b8d99d0a4c27c7d7a2, 8295053a65eb152dc2305a1cdb9ee7a0dcdcd8e5 and 4ea324cddb61524c6388ab31409cdbbb4a36531a).

One problem is that the flux-density-weighted line centroid is not currently being calculated, as this functionality is not provided by the underlying code. Another problem is that Busy Function fitting appears to fail frequently, even in cases of fairly simple spectra, thus producing arbitrary output that destroys the output catalogue formatting.

The easiest way forward would be to check if there is a simple fix to those problems, in which case the Busy Function fitting feature could remain. If no easy fix is available, we may need to disable Busy Function fitting altogether, at least for the upcoming release of SoFiA 1.4.

SoFiA-Admin avatar Dec 03 '19 08:12 SoFiA-Admin

I've had a look at the code. We accidentally re-used the iteration limit from the search phase in the refinement stage. That means we were only using a maximum of 30 iterations to refine our best starting point. I've updated this to 5,000 iterations. I also increased the number of random search locations from 1,000 to 1,500, while dialling down the number of iterations from 30 to 20. That'll keep the number of operations the same, while exploring 50% more start locations.

Adding the flux-density-weighted line centroid is very straight forward.

RussellJurek avatar Dec 09 '19 06:12 RussellJurek

While I'm at it, I'll provide an option to use the CalcObsParams function from my library. It uses the covariance matrix from the fitting, to generate random BF fit profiles. It then re-calculates the observational parameters of each to create a distribution. From this, it can pull the range of possible values.

RussellJurek avatar Dec 09 '19 06:12 RussellJurek

Another feature we should consider using is the ApproxObsCovar function. It uses a finite difference approximation to generate an approximation of the covariance matrix for the observational parameters. It won't be hard to add in the flux-density-weighted line centroid.

A feature I've considered adding to the BFfit library is a 'peak finder' within the observational parameters calculator. I'm already generating a 1D distribution for each observational parameter. In principle, I just need to implement a 1-pass filter looking for local maximum. Then I can store the results for each of these. At the very least I can record how many local peaks exist for each parameter. That'd be a good metric for the reliability of the observational parameter. A good fit should only produce a single maximum. A value of 0 would also let you know that the value varies monotonically.

Alternatively, I could calculate the first and second derivatives about the mean or median value. I think the number of peaks is a cleaner, more robust and simpler metric though. If it isn't 1, then you need to dig in and see what's happening.

A bigger question is if we want to return the 1D parameter distribution for 5 observables. That's 20 values each for a reasonably crude histogram. That'd cost: 5 params * 20 values * 4 Bytes = 400B / detection = 4kB / 10 detections. A datacube with 10,000 detections would require ~4MB. It's not that prohibitive. It's probably useless in most situations though. It'd be very useful as an advanced tool though. One within SoFiA that sits outside of the regular source finding. You'd use it to go back and examine BF fits in detail.

RussellJurek avatar Dec 09 '19 07:12 RussellJurek