gutenberg
gutenberg copied to clipboard
Gutenberg logs are all under the same `name`
In Gutenberg logs, only one logger name
(gutenberg2zim.constants
) is used making it pretty useless.
[gutenberg2zim.constants::2023-08-19 11:40:30,563] INFO: Parsing file cache/epub/99/pg99.rdf for book id 99
[gutenberg2zim.constants::2023-08-19 11:40:31,442] INFO: Parsing file cache/epub/9/pg9.rdf for book id 9
[gutenberg2zim.constants::2023-08-19 11:40:32,515] INFO:Add possible url to db
[gutenberg2zim.constants::2023-08-19 11:40:32,517] DEBUG:bash -c rsync -a --list-only rsync://aleph.pglaf.org/gutenberg/ > tmp/file_on_aleph_pglaf_org
We should not log the name
anymore and instead log the filename with %(filename)s
or module with %(module)s
We've found that a single name is enough in most scrapers so we use the name to distinguish our logs from the other dependencies. Here it should use gutenberg2zim
instead of the module name.
We could use different name base on file or module but it brings little value and make the logs very difficult to read because lines are not aligned (prefix size changes)
@benoit74 , I would like to implement this. Should I stick with keeping the module names or just use gutenberg2zim
as @rgaudin suggested?
Just use one name, gutenber2zim
as suggested by @rgaudin
And please adapt the code to create the logger with scraperlib getLogger
function like we try to harmonize among our codebase.
One good example of this approach is in offspot/demo:
- create the logger in constants.py (or something like that) with
getLogger
: https://github.com/offspot/demo/blob/669163c8bc864a0120ac8b31f3c8c73c57acb55e/src/offspot_demo/constants.py#L53 - adjust the log level if debug/verbose mode is activated : https://github.com/offspot/demo/blob/669163c8bc864a0120ac8b31f3c8c73c57acb55e/src/offspot_demo/watcher.py#L76
Thank you!