pycbc icon indicating copy to clipboard operation
pycbc copied to clipboard

Check that psd is not None before using ifo for followup

Open maxtrevor opened this issue 1 year ago • 4 comments

The PyCBC Live analysis of O3 replay data recently encountered a bug we had not seen before.

In one analysis stride we see

2024-12-13T00:57:02.290-08:00 pycbc-live-test 0 V1 time has invalid data, resetting buffer
2024-12-13T00:57:02.290-08:00 pycbc-live-test 0 Insufficient data for V1 analysis

and then later in the same stride:

2024-12-13T00:57:08.820-08:00 pycbc-live-test 0 Found H1-L1 coinc with ifar 0.0008384836735184549
2024-12-13T00:57:08.821-08:00 pycbc-live-test 0 computing followup data for coinc
2024-12-13T00:57:08.822-08:00 pycbc-live-test 0 Generating SPAtmplt, duration 104.0 s, index 215458, starting from 23.1 Hz
2024-12-13T00:57:08.951-08:00 pycbc-live-test 0 Generating SPAtmplt, duration 240.0 s, index 215458, starting from 23.1 Hz
Traceback (most recent call last):
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/mpi4py/run.py", line 198, in main
    run_command_line(args)
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 288, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/bin/pycbc_live", line 1379, in <module>
    evnt.check_coincs(list(results.keys()), best_coinc, psds)
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/bin/pycbc_live", line 520, in check_coincs
    sld = self.compute_followup_data(
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/bin/pycbc_live", line 221, in compute_followup_data
    pvalue_info = followup_event_significance(
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/pycbc/filter/matchedfilter.py", line 1955, in followup_event_significance
    stilde = data_reader.overwhitened_data(htilde.delta_f)
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/pycbc/strain/strain.py", line 1773, in overwhitened_data
    psdt = pycbc.psd.interpolate(self.psd, fseries.delta_f)
  File "/home/pycbc.live/.conda/envs/o4-test-env-2024-11-01/lib/python3.9/site-packages/pycbc/psd/estimate.py", line 313, in interpolate
    new_n = (len(series)-1) * series.delta_f / delta_f + 1
TypeError: object of type 'NoneType' has no len()

I have diagnosed what happened as follows:

  • when advancing the V1 StrainBuffer, part of the data is found to be invalid by https://github.com/gwastro/pycbc/blob/master/pycbc/strain/strain.py#L1888-L1897
  • This results in the V1 psd being set to None by https://github.com/gwastro/pycbc/blob/master/pycbc/strain/strain.py#L1686
  • A coinc event was found in the same stride and attempted to use the V1 data for followup
  • The use of V1 data for followup somehow got past the check https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1928
  • The analysis reached https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1955 and tried to overwhiten the data
  • The function call https://github.com/gwastro/pycbc/blob/master/pycbc/strain/strain.py#L1773 produced an error because it was passed the V1 psd that had previously been set to None

While I am not sure why the check https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1928 did not catch that the data was invalid, adding an explicit check whether the psd is None should prevent this error from happening again.

maxtrevor avatar Dec 13 '24 20:12 maxtrevor

Since we do not understand the "somehow" above, I am hesitant to just sweep this potential situation under the carpet.

titodalcanton avatar Dec 16 '24 16:12 titodalcanton

To put in writing the discussion we had today, I propose to watch closely if this happens again in the next weeks, in which case we will reconsider merging this as an urgent fix. Otherwise, I think we should try and reproduce this with simulations so that we understand what is going on exactly.

titodalcanton avatar Dec 17 '24 16:12 titodalcanton

The same error happened again today, this time in the production analysis:

Traceback (most recent call last):
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/mpi4py/run.py", line 198, in main
    run_command_line(args)
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 288, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pycbc.live/.conda/envs/o4-prod-env/bin/pycbc_live", line 1333, in <module>
    evnt.check_coincs(list(results.keys()), best_coinc, psds)
  File "/home/pycbc.live/.conda/envs/o4-prod-env/bin/pycbc_live", line 520, in check_coincs
    sld = self.compute_followup_data(
  File "/home/pycbc.live/.conda/envs/o4-prod-env/bin/pycbc_live", line 220, in compute_followup_data
    pvalue_info = followup_event_significance(
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/pycbc/filter/matchedfilter.py", line 1952, in followup_event_significance
    stilde = data_reader.overwhitened_data(htilde.delta_f)
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/pycbc/strain/strain.py", line 1762, in overwhitened_data
    psdt = pycbc.psd.interpolate(self.psd, fseries.delta_f)
  File "/home/pycbc.live/.conda/envs/o4-prod-env/lib/python3.9/site-packages/pycbc/psd/estimate.py", line 307, in interpolate
    new_n = (len(series)-1) * series.delta_f / delta_f + 1
TypeError: object of type 'NoneType' has no len()

titodalcanton avatar Jan 17 '25 12:01 titodalcanton

The same bug occurred again earlier today in the MDC analysis. I noticed that it occurred exactly 80 days (2 MDC durations) after the first occurrence, suggesting that there is a specific injection in the MDC that triggers this bug.

maxtrevor avatar Mar 03 '25 16:03 maxtrevor