pycbc icon indicating copy to clipboard operation
pycbc copied to clipboard

List of computing projects

Open GarethCabournDavies opened this issue 2 years ago • 13 comments

This is a place to list projects which make the code much easier to contribute to/use, but don't necessarily add to the science:

  • Lint/black everything
  • ~Make the ranking statistics truly modular~ - done
  • Make all of PyCBC Live run in the single executable
    • e.g. singles stuff and new Denty bins moves into modules rather than separate executables, executables become wrappers around the modules
  • ~Make everything use init_logging (as appropriate)~ - done
  • ~Standardise how versioning is handled by each executable~ - done
  • Make releases easier, e.g. remove requirement to do release/unrelease commits
  • Injection sorting (for foundmissed and injection minifollowup) should use a common function

Add suggestions, and they may be implemented if I or others get time

GarethCabournDavies avatar Sep 13 '23 08:09 GarethCabournDavies

Reminder that there is a whole wiki page of projects for PyCBC Live, including computing stuff: https://github.com/gwastro/pycbc/wiki/PyCBC-Live-O4-development

titodalcanton avatar Sep 13 '23 09:09 titodalcanton

It would be really nice if PyCBC Live's single fitting code did not have to read tons of trigger files from a slow filesystem. Maybe we can keep a RAM rolling cache of triggers to be used for the fits?

titodalcanton avatar Sep 21 '23 08:09 titodalcanton

It would be really nice if PyCBC Live's single fitting code did not have to read tons of trigger files from a slow filesystem. Maybe we can keep a RAM rolling cache of triggers to be used for the fits?

I'd discussed this with @ArthurTolley recently, but had thought that any cache would be ridiculously large. If it isn't, then this would make things a lot easier

GarethCabournDavies avatar Sep 21 '23 08:09 GarethCabournDavies

Turns out the slow filesystem is not the issue, see https://github.com/gwastro/pycbc/issues/4501. I guess improving that still counts as "computing" though!

titodalcanton avatar Sep 25 '23 16:09 titodalcanton

Version information is incomplete for released code (sometimes?!)

e.g. v2.2.0 is complete:

(pycbc-v2.2.0) ~$ pycbc_page_foreground --version
Version: 2.2.0 Branch: None Tag: v2.2.0 Id: eb2559e4b7e40c24ffea29d6fff2b5cdbc70662e Builder: Unknown
User <> Build date: 2023-03-09 20:18:12 +0000 Repository status is CLEAN: All modifications committed

but v2.2.2 isn't:

(pycbc-v2.2.2) ~$  pycbc_page_foreground --version
Version: 2.2.2 Branch: Tag: Id: Builder: Build date: Repository status is

GarethCabournDavies avatar Sep 27 '23 09:09 GarethCabournDavies

For released code is it not sufficient to say "This is version 2.2.2"?

spxiwh avatar Sep 27 '23 10:09 spxiwh

For released code is it not sufficient to say "This is version 2.2.2"?

Build date would also be useful, but I get the rest is superfluous

GarethCabournDavies avatar Sep 27 '23 13:09 GarethCabournDavies

For the 'Make everything use init_logging (as appropriate)' task, we can also use this as an opportunity to avoid the super-verbose pegasus loigging when running workflow generators.

I think we can pass a logger instance, pycbc_logger, as an optional argument to init_logging, and then by calling pycbc_logger.info instead of logging.info we avoid passing the verbosity to imported modules. However passing this to our own modules means an extra argument to each function, and so things become complicated

GarethCabournDavies avatar Oct 03 '23 15:10 GarethCabournDavies

There's better options than sending the logger through to every function .... Maybe Coleman would have suggestions on this?

spxiwh avatar Oct 03 '23 16:10 spxiwh

Yup - wanted to avoid it for that reason.

Coleman suggested a couple of things:

  • wrapping every noisy call to Pegasus with a with statement reducing the logging level
    • This may be tractable as the pegasus calls themselves are almost all contained within pycbc/workflow/pegasus_workflow.py, with just one outside this in pycbc/workflow/core.py
    • In fact I think it is just this call causing the extreme verbosity because of the parallelisation of inspiral jobs
  • If pegasus uses a named logger, grab it and reduce the level see here
    • I think it does, so this may be a better option than the above as it will cut things out in future as well
  • Set up a filter see here
  • raising a github issue for pegasus saying "please be less verbose"

GarethCabournDavies avatar Oct 04 '23 08:10 GarethCabournDavies

It would be really nice if PyCBC Live's single fitting code did not have to read tons of trigger files from a slow filesystem. Maybe we can keep a RAM rolling cache of triggers to be used for the fits?

I'd discussed this with @ArthurTolley recently, but had thought that any cache would be ridiculously large. If it isn't, then this would make things a lot easier

I wasn't aware that people were discussing how to deal with trigger fitting in a more 'logical' way .. One thing to note is that for the exponential fit, there should be a way of averaging an old fit result with a more recent one (similar to what is done between neighbouring templates) because the formula for the exponential fit parameter depends linearly on the trigger counts and stat values. And linear behaviour allows for adding / weighted averaging between datasets.

But .. if we want to do something about this can we open a separate issue to extract it from the laundry list?

tdent avatar Nov 02 '23 22:11 tdent

Just to note that @tdent's last comment is implemented in #4670

GarethCabournDavies avatar Jun 07 '24 15:06 GarethCabournDavies