List of computing projects
This is a place to list projects which make the code much easier to contribute to/use, but don't necessarily add to the science:
- Lint/black everything
- ~Make the ranking statistics truly modular~ - done
- Make all of PyCBC Live run in the single executable
- e.g. singles stuff and new Denty bins moves into modules rather than separate executables, executables become wrappers around the modules
- ~Make everything use init_logging (as appropriate)~ - done
- ~Standardise how versioning is handled by each executable~ - done
- Make releases easier, e.g. remove requirement to do release/unrelease commits
- Injection sorting (for foundmissed and injection minifollowup) should use a common function
Add suggestions, and they may be implemented if I or others get time
Reminder that there is a whole wiki page of projects for PyCBC Live, including computing stuff: https://github.com/gwastro/pycbc/wiki/PyCBC-Live-O4-development
It would be really nice if PyCBC Live's single fitting code did not have to read tons of trigger files from a slow filesystem. Maybe we can keep a RAM rolling cache of triggers to be used for the fits?
It would be really nice if PyCBC Live's single fitting code did not have to read tons of trigger files from a slow filesystem. Maybe we can keep a RAM rolling cache of triggers to be used for the fits?
I'd discussed this with @ArthurTolley recently, but had thought that any cache would be ridiculously large. If it isn't, then this would make things a lot easier
Turns out the slow filesystem is not the issue, see https://github.com/gwastro/pycbc/issues/4501. I guess improving that still counts as "computing" though!
Version information is incomplete for released code (sometimes?!)
e.g. v2.2.0 is complete:
(pycbc-v2.2.0) ~$ pycbc_page_foreground --version
Version: 2.2.0 Branch: None Tag: v2.2.0 Id: eb2559e4b7e40c24ffea29d6fff2b5cdbc70662e Builder: Unknown
User <> Build date: 2023-03-09 20:18:12 +0000 Repository status is CLEAN: All modifications committed
but v2.2.2 isn't:
(pycbc-v2.2.2) ~$ pycbc_page_foreground --version
Version: 2.2.2 Branch: Tag: Id: Builder: Build date: Repository status is
For released code is it not sufficient to say "This is version 2.2.2"?
For released code is it not sufficient to say "This is version 2.2.2"?
Build date would also be useful, but I get the rest is superfluous
For the 'Make everything use init_logging (as appropriate)' task, we can also use this as an opportunity to avoid the super-verbose pegasus loigging when running workflow generators.
I think we can pass a logger instance, pycbc_logger, as an optional argument to init_logging, and then by calling pycbc_logger.info instead of logging.info we avoid passing the verbosity to imported modules.
However passing this to our own modules means an extra argument to each function, and so things become complicated
There's better options than sending the logger through to every function .... Maybe Coleman would have suggestions on this?
Yup - wanted to avoid it for that reason.
Coleman suggested a couple of things:
- wrapping every noisy call to Pegasus with a
withstatement reducing the logging level- This may be tractable as the pegasus calls themselves are almost all contained within
pycbc/workflow/pegasus_workflow.py, with just one outside this inpycbc/workflow/core.py - In fact I think it is just this call causing the extreme verbosity because of the parallelisation of inspiral jobs
- This may be tractable as the pegasus calls themselves are almost all contained within
- If pegasus uses a named logger, grab it and reduce the level see here
- I think it does, so this may be a better option than the above as it will cut things out in future as well
- Set up a filter see here
- raising a github issue for pegasus saying "please be less verbose"
It would be really nice if PyCBC Live's single fitting code did not have to read tons of trigger files from a slow filesystem. Maybe we can keep a RAM rolling cache of triggers to be used for the fits?
I'd discussed this with @ArthurTolley recently, but had thought that any cache would be ridiculously large. If it isn't, then this would make things a lot easier
I wasn't aware that people were discussing how to deal with trigger fitting in a more 'logical' way .. One thing to note is that for the exponential fit, there should be a way of averaging an old fit result with a more recent one (similar to what is done between neighbouring templates) because the formula for the exponential fit parameter depends linearly on the trigger counts and stat values. And linear behaviour allows for adding / weighted averaging between datasets.
But .. if we want to do something about this can we open a separate issue to extract it from the laundry list?
Just to note that @tdent's last comment is implemented in #4670