trusat-orbit icon indicating copy to clipboard operation
trusat-orbit copied to clipboard

new_rde_data_line_re.search() stalls

Open interplanetarychris opened this issue 6 years ago • 2 comments

This was previously https://github.com/consensys-space/trusat-backend/issues/26 as follows:

Following unit test updates, both SeeSat archive import scripts appear to be "paused," in a Ctrl-S terminal way, in the process of importing. Ctrl-C in the terminal allows them to continue, but it is unclear what is the underlying cause, or if data loss happens because of the Ctrl-C.

Debugger sessions have yet to refine the location of the problem itself.

From terminal feedback with the -V flag: https://github.com/consensys-space/trusat-backend/blob/257bf4606147dbd927f3c3b9acf453b475c22656/database_tools/read_seesat_mbox.py#L346

...the script appears to consistently "pause" after the following lines:

Found  67 IOD obs in msg: 2015-03-19 23:23:00+01:00 LB Obs 2015 Mar 19
Found 162 IOD obs in msg: 2018-04-21 10:05:40+02:00 LB Obs 2018 Apr 20-21 night
Found   2 IOD obs in msg: 2019-03-27 22:56:50+01:00 Obs 2019 Mar 27 pm
Found   3 IOD obs in msg: 2019-09-20 07:28:25-04:00 slow moving unid seen on Sept 19

and has been traced to the following regex in iod.py:

https://github.com/consensys-space/trusat-orbit/blob/ea82c90af2645183318f7fc716a901960e0d5c65/iod.py#L760-L784

Re-writing the regex with no extraneous whitespace (to disallow the OR'ed flags for re.MULTILINE and re.VERBOSE) did not solve the problem.

It could be due to a problem with exponential possibilities on possible matches, referenced in https://bugs.python.org/issue29977

An interim "solution" is to use the previous RDE-block matching regexp (rde_format_re), which is not as comprehensive.

interplanetarychris avatar Oct 12 '19 18:10 interplanetarychris

For reference, when importing the hypermail seesat archive, the older version of the regexp results in: Processed 402373 observations in 81021 files in 25 directories. (277864) IOD records (69.06 %) (42221) UK records (10.49 %) (82288) RDE records (20.45 %)

interplanetarychris avatar Oct 12 '19 18:10 interplanetarychris

For reference in MBOX processing: Processed 201211 observations in 12773 messages in 21.705 seconds. (176197) IOD records (87.6 %) ( 3589) UK records ( 1.8 %) ( 21425) RDE records (10.6 %)

Last messageID imported from: [email protected]

interplanetarychris avatar Oct 12 '19 19:10 interplanetarychris