interpolation icon indicating copy to clipboard operation
interpolation copied to clipboard

reduce verbosity of `*.{err, skip}` logs

Open missinglink opened this issue 3 years ago • 0 comments

A planet-wide interpolation build tends to encounter a lot of bad/invalid OpenAddresses data.

The logging for this is quite verbose, the *.err logger writes both the offending row in JSON format and a stack trace for each, this results in massive log files which are mostly duplicitous and end up being significantly larger than the actual database itself.

21.7 GiB address.db.gz
63.9 GiB conflate_oa.err
533.5 KiB conflate_oa.out
29.6 GiB conflate_oa.skip

This issue is to consider ways of reducing the logging verbosity, possible solutions:

  • remove the stack traces if not useful
  • allow the log stream to be compressed, or attached to a compressor

As somewhat of an aside, try/catch here incurs a fair cost, mainly due to v8 having to produce stack traces. It may be possible to 'return errors as values' here instead? doing so would distinguish actual errors from validation issues and also improve performance by avoiding try/catch and the instantiation of Error instances.

missinglink avatar Nov 01 '21 11:11 missinglink

The /parser/search & /parser/findbyid endpoints would be good candidates for this.

What is the intent of the /status endpoint, should it be a 'dumb' endpoint which doesn't hit the database or a 'smart' one which does?

missinglink avatar Nov 27 '19 12:11 missinglink