trusat-orbit
trusat-orbit copied to clipboard
new_rde_data_line_re.search() stalls
This was previously https://github.com/consensys-space/trusat-backend/issues/26 as follows:
Following unit test updates, both SeeSat archive import scripts appear to be "paused," in a Ctrl-S terminal way, in the process of importing. Ctrl-C in the terminal allows them to continue, but it is unclear what is the underlying cause, or if data loss happens because of the Ctrl-C.
Debugger sessions have yet to refine the location of the problem itself.
From terminal feedback with the -V flag: https://github.com/consensys-space/trusat-backend/blob/257bf4606147dbd927f3c3b9acf453b475c22656/database_tools/read_seesat_mbox.py#L346
...the script appears to consistently "pause" after the following lines:
Found 67 IOD obs in msg: 2015-03-19 23:23:00+01:00 LB Obs 2015 Mar 19
Found 162 IOD obs in msg: 2018-04-21 10:05:40+02:00 LB Obs 2018 Apr 20-21 night
Found 2 IOD obs in msg: 2019-03-27 22:56:50+01:00 Obs 2019 Mar 27 pm
Found 3 IOD obs in msg: 2019-09-20 07:28:25-04:00 slow moving unid seen on Sept 19
and has been traced to the following regex in iod.py:
https://github.com/consensys-space/trusat-orbit/blob/ea82c90af2645183318f7fc716a901960e0d5c65/iod.py#L760-L784
Re-writing the regex with no extraneous whitespace (to disallow the OR'ed flags for re.MULTILINE and re.VERBOSE) did not solve the problem.
It could be due to a problem with exponential possibilities on possible matches, referenced in https://bugs.python.org/issue29977
An interim "solution" is to use the previous RDE-block matching regexp (rde_format_re), which is not as comprehensive.
For reference, when importing the hypermail seesat archive, the older version of the regexp results in: Processed 402373 observations in 81021 files in 25 directories. (277864) IOD records (69.06 %) (42221) UK records (10.49 %) (82288) RDE records (20.45 %)
For reference in MBOX processing: Processed 201211 observations in 12773 messages in 21.705 seconds. (176197) IOD records (87.6 %) ( 3589) UK records ( 1.8 %) ( 21425) RDE records (10.6 %)
Last messageID imported from: [email protected]