Check that psd is not None before using ifo for followup
The PyCBC Live analysis of O3 replay data recently encountered a bug we had not seen before.
In one analysis stride we see
2024-12-13T00:57:02.290-08:00 pycbc-live-test 0 V1 time has invalid data, resetting buffer
2024-12-13T00:57:02.290-08:00 pycbc-live-test 0 Insufficient data for V1 analysis
and then later in the same stride:
2024-12-13T00:57:08.820-08:00 pycbc-live-test 0 Found H1-L1 coinc with ifar 0.0008384836735184549
2024-12-13T00:57:08.821-08:00 pycbc-live-test 0 computing followup data for coinc
2024-12-13T00:57:08.822-08:00 pycbc-live-test 0 Generating SPAtmplt, duration 104.0 s, index 215458, starting from 23.1 Hz
2024-12-13T00:57:08.951-08:00 pycbc-live-test 0 Generating SPAtmplt, duration 240.0 s, index 215458, starting from 23.1 Hz
Traceback (most recent call last):
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/mpi4py/__main__.py", line 7, in <module>
main()
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/mpi4py/run.py", line 198, in main
run_command_line(args)
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/mpi4py/run.py", line 47, in run_command_line
run_path(sys.argv[0], run_name='__main__')
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 288, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/bin/pycbc_live", line 1379, in <module>
evnt.check_coincs(list(results.keys()), best_coinc, psds)
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/bin/pycbc_live", line 520, in check_coincs
sld = self.compute_followup_data(
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/bin/pycbc_live", line 221, in compute_followup_data
pvalue_info = followup_event_significance(
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/pycbc/filter/matchedfilter.py", line 1955, in followup_event_significance
stilde = data_reader.overwhitened_data(htilde.delta_f)
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/pycbc/strain/strain.py", line 1773, in overwhitened_data
psdt = pycbc.psd.interpolate(self.psd, fseries.delta_f)
File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/pycbc/psd/estimate.py", line 313, in interpolate
new_n = (len(series)-1) * series.delta_f / delta_f + 1
TypeError: object of type 'NoneType' has no len()
I have diagnosed what happened as follows:
- when advancing the V1 StrainBuffer, part of the data is found to be invalid by https://github.com/gwastro/pycbc/blob/master/pycbc/strain/strain.py#L1888-L1897
- This results in the V1 psd being set to None by https://github.com/gwastro/pycbc/blob/master/pycbc/strain/strain.py#L1686
- A coinc event was found in the same stride and attempted to use the V1 data for followup
- The use of V1 data for followup somehow got past the check https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1928
- The analysis reached https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1955 and tried to overwhiten the data
- The function call https://github.com/gwastro/pycbc/blob/master/pycbc/strain/strain.py#L1773 produced an error because it was passed the V1 psd that had previously been set to None
While I am not sure why the check https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1928 did not catch that the data was invalid, adding an explicit check whether the psd is None should prevent this error from happening again.
Since we do not understand the "somehow" above, I am hesitant to just sweep this potential situation under the carpet.
To put in writing the discussion we had today, I propose to watch closely if this happens again in the next weeks, in which case we will reconsider merging this as an urgent fix. Otherwise, I think we should try and reproduce this with simulations so that we understand what is going on exactly.
The same error happened again today, this time in the production analysis:
Traceback (most recent call last):
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/mpi4py/__main__.py", line 7, in <module>
main()
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/mpi4py/run.py", line 198, in main
run_command_line(args)
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/mpi4py/run.py", line 47, in run_command_line
run_path(sys.argv[0], run_name='__main__')
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 288, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/pycbc.live/.conda/envs/o4-prod-env/bin/pycbc_live", line 1333, in <module>
evnt.check_coincs(list(results.keys()), best_coinc, psds)
File "/home/pycbc.live/.conda/envs/o4-prod-env/bin/pycbc_live", line 520, in check_coincs
sld = self.compute_followup_data(
File "/home/pycbc.live/.conda/envs/o4-prod-env/bin/pycbc_live", line 220, in compute_followup_data
pvalue_info = followup_event_significance(
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/pycbc/filter/matchedfilter.py", line 1952, in followup_event_significance
stilde = data_reader.overwhitened_data(htilde.delta_f)
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/pycbc/strain/strain.py", line 1762, in overwhitened_data
psdt = pycbc.psd.interpolate(self.psd, fseries.delta_f)
File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/pycbc/psd/estimate.py", line 307, in interpolate
new_n = (len(series)-1) * series.delta_f / delta_f + 1
TypeError: object of type 'NoneType' has no len()
The same bug occurred again earlier today in the MDC analysis. I noticed that it occurred exactly 80 days (2 MDC durations) after the first occurrence, suggesting that there is a specific injection in the MDC that triggers this bug.