interpolation
interpolation copied to clipboard
reduce verbosity of `*.{err, skip}` logs
A planet-wide interpolation build tends to encounter a lot of bad/invalid OpenAddresses data.
The logging for this is quite verbose, the *.err
logger writes both the offending row in JSON format and a stack trace for each, this results in massive log files which are mostly duplicitous and end up being significantly larger than the actual database itself.
21.7 GiB address.db.gz
63.9 GiB conflate_oa.err
533.5 KiB conflate_oa.out
29.6 GiB conflate_oa.skip
This issue is to consider ways of reducing the logging verbosity, possible solutions:
- remove the stack traces if not useful
- allow the log stream to be compressed, or attached to a compressor
As somewhat of an aside, try/catch here incurs a fair cost, mainly due to v8 having to produce stack traces. It may be possible to 'return errors as values' here instead? doing so would distinguish actual errors from validation issues and also improve performance by avoiding try/catch and the instantiation of Error instances.
The /parser/search
& /parser/findbyid
endpoints would be good candidates for this.
What is the intent of the /status
endpoint, should it be a 'dumb' endpoint which doesn't hit the database or a 'smart' one which does?