drizzlepac icon indicating copy to clipboard operation
drizzlepac copied to clipboard

Drizzlepac/HAP: Improve algorithm for switching to crowded field kernel for HAP segment catalogs

Open stscijgbot-hstdp opened this issue 10 months ago • 6 comments

Issue HLA-1427 was created on JIRA by Rick White:

Recently I have discovered some images where the HAP software used the Gaussian kernel, but the RW2D kernel would have been a better choice. This innerspace page shows examples of the issues with the current scheme and discusses the thresholds that are used to switch to the crowded-field kernel.

These issues can be fixed with a new threshold test that should improve the decision of whether to switch to theRW2D kernel. The test is very similar to the current test that sets a threshold on the percentage of the image covered by the biggest segment region. But instead of the percentage size of the biggest island, it uses the size of that island in pixels. That is already calculated (and printed) in the code. A single threshold value of 25000 pixels will work for all instruments and detectors. In order to minimize the changes, I propose adding this new test alongside the existing algorithms rather than modifying the existing algorithms or parameters.

This is a relatively minor code change but should improve the HAP catalog quality in approximately 15% of the fields.

I think this can be integrated with the other recent HAP catalog changes fairly easily. In particular, the testing for the new modifications can be combined with the testing for this addition.

stscijgbot-hstdp avatar Jan 31 '25 21:01 stscijgbot-hstdp

Comment by Michele De La Pena on JIRA:

Implemented the suggested changes and tested three datasets thus far: icw2 (UVIS, IR), j8e1 (WFC, HRC), and jcnm (WFC).  The ds9 display of the results (Segment on left in Green, Point on right in Red) for these three datasets are attached as the last four images in the attachment.

stscijgbot-hstdp avatar Feb 17 '25 22:02 stscijgbot-hstdp

Comment by Michele De La Pena on JIRA:

Additional test datasets are from Rick's innerspace page images so a direct comparison can be done.  Sorry as the images did not attach in the order I had wanted them to be in.  Rick White I have put the total drz/drc images, as well as the total point/segment ecsv and reg files on the Linux cluster in /home/mdelapena/ForRick so that you can better "see" the results.  NOTE the reg files have only two columns (X, Y) and the coordinates are "1" based.  I am currently checking the log files to glean more information.

hst_10235_10_acs_wfc_total_j8xz10

hst_17506_08_wfc3_uvis_total_if8c08

hst_9401_66_acs_wfc_total_j8fs66

hst_9401_77_acs_wfc_total_j8fs77

hst_9771_68_acs_wfc_total_j8mq68

hst_9575_eu_acs_wfc_total_j8fneu

hst_10905_09_acs_wfc_total_j9ov09

stscijgbot-hstdp avatar Feb 20 '25 13:02 stscijgbot-hstdp

Comment by Rick White on JIRA:

Michele De La Pena Thanks! I have looked at all of these. I created a web page to make it easy to compare the old and new versions. It has comments on all the pairs.

There was only one case where I thought the new version was probably worse than the old version: hst_9575_eu_acs_wfc_total_j8fneu, which has a number of bright stars along with some galaxies; the galaxies got better but the stars got worse, which was as expected. There are a few cases where the new segment catalog is a bit shallower (that might or might not be related to this change). Otherwise the new catalog was better (sometimes much better). So it seems like this is working pretty much as we hoped.

I noticed that the point catalogs are systematically much deeper than before. I assume that this change to the kernel does not affect the point catalogs, and that those are the result of other changes in the code? If that's correct, we probably need to tune the point catalog thresholds.

A typical example of the changes in the point catalog is visit hst_13750_11_acs_wfc_total_jcnm11. The old catalog has 218 sources in the point catalog, while the new catalog has 8,656 sources and is clearly too deep.

stscijgbot-hstdp avatar Feb 20 '25 17:02 stscijgbot-hstdp

Comment by Rick White on JIRA:

Michele De La Pena

P.S. It would be helpful if you put the trailer files (\*total\*.txt) in the test directories. I would be interested to look at them to see how things have changed.

stscijgbot-hstdp avatar Feb 20 '25 18:02 stscijgbot-hstdp

Comment by Michele De La Pena on JIRA:

Rick White I have put the trailer files in the ForRick subdirectory.  Despite my trying to get to it, I am just beginning to check the trailer files myself.  Remember that the SVM files are from Drizzlepac 3.7.1.1 which went out as part of a build delivered in October 2024.  I am using nearly the latest Drizzlepac code, so it is Drizzlepac 3.9.1rc0 which has all the changes from Drizzlepac 3.9.0 plus more Photutils changes, as well as any other changes which have been checked in recently (note there was no 3.8.x).  I associate the Photutils changes with alignment, but the starfinder algorithm the Point catalog uses to find_point_sources() uses the same modified routine under the covers.  I suspect I should determine the reason the Point catalogs have so many more sources as best I can before any tuning is done!?

stscijgbot-hstdp avatar Feb 20 '25 19:02 stscijgbot-hstdp

Comment by Michele De La Pena on JIRA:

Rick White Though I have closed this ticket as the software has been updated, approved, and merged onto the main branch in Drizzlepac, I wanted to follow-up on a question posed by Rick in his comments on 20 Feb 2025 at 12:53.  Specifically,

A typical example of the changes in the point catalog is visit hst_13750_11_acs_wfc_total_jcnm11. The old catalog has 218 sources in the point catalog, while the new catalog has 8,656 sources and is clearly too deep.```
I had suspicions regarding the code change which caused this situation. I first compared the trailer files for _hst_13750_11_acs_wfc_total_jcnm11,_ and I could see that while both the "new" and "old" versions of the code both used the sigma_clipped_statistics method to define a background image, the "new" code computed a much smaller RMS (by an order of magnitude) than the "old" code (0.00455 vs 0.0367). Since the Point catalog uses this RMS to determine a threshold limit (5 * RMS) above which sources may be detected, it makes sense that the lower threshold allows for a larger number of source detections.  I then took the version of Drizzlepac I am using and basically reverted some code (Git PR#⁠1908 / HLA-1362) to identify the specific code update which is the cause of this change.  I was able to confirm this update is the reason for the now deeper Point catalogs.  As noted, there will need to be some updates to tune the Point catalog thresholds.

stscijgbot-hstdp avatar Feb 21 '25 20:02 stscijgbot-hstdp