pycbc icon indicating copy to clipboard operation
pycbc copied to clipboard

Moving em_progenitors.py and ns_sequence data from tmpltbank to new neutron_stars package

Open SamuelH-97 opened this issue 2 years ago • 8 comments

The functions within tmpltbank/em_progenitors.py serve purposes beyond template banks, for instance, they are used quite integrally in pygrb where the remnant mass function is used to separate potentially EM-bright injections from EM-dim injections. Here, I have moved the remnant_mass function to conversions.py so it may be used more broadly, and moved the rest of the em_progenitors.py functions to the new ns_functions.py (name subject to change) within a new package under the name neutron_stars. This package also now houses the ns_sequence data once stored in tmpltbank since it is more relevant to the functions stored here. The aim of this change is to allow for these functions to be used more broadly under a more appropriately named package.

SamuelH-97 avatar Feb 15 '22 14:02 SamuelH-97

@a-r-williamson @SamuelH-97 @pannarale @jakeb245 @nathan.ormsby Hi Samuel (and possibly Andrew). Jacob Buchanan and Nathan Ormsby would like to help with the injection part of the PyGRB workflow if possible. We've taken a look through this PR and the linked PR (not all the code in detail), and would like to know if there is a specific issue or problem we could tackle. It looks like the major problem/issue is editing the code to make it mergeable with the upstream branch? If you can just give us one thing to do, we'll start diving into the code and take it on!

rpfisher avatar Jul 07 '22 19:07 rpfisher

Starting to look at this again. It seems codeclimate has picked up a few genuine issues, for example there is a reference to an undefined variable, and some docstrings have backslashes which are being interpreted as escape sequences. Could someone take a look at those things?

titodalcanton avatar Aug 29 '22 12:08 titodalcanton

Starting to look at this again. It seems codeclimate has picked up a few genuine issues, for example there is a reference to an undefined variable, and some docstrings have backslashes which are being interpreted as escape sequences. Could someone take a look at those things?

Thanks for the heads up. The undefined variable was an error on my part in a recent change and has bee resolved.

As for the backslashes, they seem to be a part of the equation formatting for automated documentation. I'm unsure if he has the time to check these but perhaps @a-r-williamson could provide some wisdom here. I'm somewhat unfamiliar with these changes to the docstrings and how they may affect both the everyday running of the code and the documentation

SamuelH-97 avatar Aug 30 '22 14:08 SamuelH-97

@SamuelH-97 For the docstrings I think you just need to add the "r" prefix: https://developer.lsst.io/python/numpydoc.html#docstrings-must-be-delimited-by-triple-double-quotes

which should not mess up the latex. In this case I think it's okay as there's nothing special in here, but try and use \tan and you'll probably spot the problem!!

spxiwh avatar Aug 30 '22 14:08 spxiwh

Or double the backslashes, \ -> \\

titodalcanton avatar Aug 30 '22 14:08 titodalcanton

I took another deep look through, and left some trivial comments and a more substantial one. I think those will be my last comments.

titodalcanton avatar Sep 12 '22 14:09 titodalcanton

@spxiwh recommends moving the large pickle file to the pycbc-config repository. I agree the file is a bit large to keep here. Will look into that.

titodalcanton avatar Sep 30 '22 11:09 titodalcanton

Some notes for posterity.

Francesco and I spent some time looking into why the ISSO interpolant is needed, since it adds quite a bit of complexity. We reproduced an earlier test from Andrew, which indeed indicates that generating the injections with the exact calculation takes ~10 min, versus just a few seconds using the interpolant. This, however, seems to contradict my own timing of the ISSO solver, which indicates that the exact calculation takes just about half a millisecond for a single point, and I struggle to believe the interpolant would be significantly faster than that.

Bottom line: I propose we modify this PR to only include the exact ISSO calculation, and leave adding the interpolant as a smaller PR later, if the extra time is found to be really unacceptable. Meanwhile, I will do some profiling to understand where the time is coming from, and see if we can find a solution that does not involve carrying around a ~1 Mb precalculated file.

titodalcanton avatar Oct 04 '22 11:10 titodalcanton

I have been poking at the ISSO code a bit more, so here are a few more observations.

First, I found that PG_ISSO_solver() is already quite fast, however the RectBivariateSpline is faster by orders of magnitude. Since pycbc_create_injections seems to invoke the constraint function millions and millions of times, using the spline does indeed reduce the execution time from several minutes (or even almost one hour with 100000 injections) to just a few seconds. So using the spline is indeed beneficial, though probably not immediately essential, as I am told that PyGRB creates the injections as a Condor job anyway.

Next, I spent some time looking for a way to make the spline pickle file significantly smaller by making the interpolation grid coarser. In doing so, I realized that PG_ISSO_solver() does not produce a sensible result when the spin magnitude is exactly 1: image This problem seems to be greatly reduced by removing the bracketing used in the third root finding in PG_ISSO_solver(), although the curve I obtain does still have a weird kink. I will not request that this issue is fixed in this PR, but we should fix it soon after this is merged.

Anyway, after making that change, I found that I can reduce the spline pickle file to ~15 Kb only, and still obtain an interpolation better than 1e-5 almost everywhere (note that the spline proposed in the PR does not satisfy the condition that the error is below 1e-5 everywhere, contrary to what the docstring of generate_isso_bivariate_interp() claims).

Conclusion of this second note: I still propose we modify this PR to only include the exact ISSO calculation (I will look into this myself next). In one or two followup PRs, we will then address the high-spin problem in PG_ISSO_solver() and bring in a smaller version of the interpolating spline.

titodalcanton avatar Oct 06 '22 11:10 titodalcanton

It seems I cannot open a PR against this PR, but the change I propose is on this branch: https://github.com/titodalcanton/pycbc/tree/embright_remove_isso_interp.

titodalcanton avatar Oct 06 '22 11:10 titodalcanton