spaCy
spaCy copied to clipboard
Add ConsoleLogger.v2
Description
This PR adds the next version of theConsoleLogger that saves the training logs to a JSONL file. It saves every row printed in the console as a JSON object. The logger has an additional argument output_file which sets the path to which the log.jsonl is being saved.
Here's an example of a log.jsonl when training an NER component. It provides the full numbers of every score.
{"epoch":0,"step":0,"losses":{"ner":101.4694003456},"scores":{"ents_f":1.8903328932,"ents_p":1.0051736881,"ents_r":15.832363213},"score":0.0189033289}
{"epoch":0,"step":200,"losses":{"ner":10539.4847255106},"scores":{"ents_f":0.2028397566,"ents_p":0.7874015748,"ents_r":0.1164144354},"score":0.0020283976}
{"epoch":0,"step":400,"losses":{"ner":1990.8501044126},"scores":{"ents_f":18.636755824,"ents_p":36.0,"ents_r":12.5727590221},"score":0.1863675582}
{"epoch":1,"step":600,"losses":{"ner":2018.4897573819},"scores":{"ents_f":36.0300250209,"ents_p":63.5294117647,"ents_r":25.1455180442},"score":0.3603002502}
Types of change
New feature and change to documentation
Checklist
- [x] I confirm that I have the right to submit this contribution under the project's MIT license.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
The documentation will be adjusted as soon as the code has been approved 😄
Don't we want new loggers to be added to the spacy-loggers package instead?
This is a very very good question!
Not necessarily, especially if it's something short and general-purpose like the console logger.
Good point/question, but I agree with Adriane that this is generic enough to be in the main library. Like I wouldn't want to add 3 more loggers here, but one for the console and one for file logging makes sense to me. Anything else can go into spacy-loggers.
Is there a way to implement this without duplicating so much code?
Yes, I think so I'll try to optimize this
I've renamed the FileLogger.v1 to ConsoleLogger.v2 and moved the ConsoleLogger.v1 to spacy-legacy in this PR. I changed the output_file argument so that it sets the full path to the file now instead of the directory.
I also added the print_progress argument which makes printing to the console an optional feature. For now, I've only ensured that the logger writes JSONL files only.
Is there a good way to write unit tests for these kinds of things? I've recreated all possible cases locally, but I think it might be better if we'd have some predefined unit tests.
Is there a good way to write unit tests for these kinds of things? I've recreated all possible cases locally, but I think it might be better if we'd have some predefined unit tests.
That's going to be pretty involved either way, it's difficult to test these kind of console/output/file things in full detail. Some basic tests that the functionality runs and the different options can coexist together might be sufficient at this point, and it's good to hear that you did more thorough tests locally as well.
I'll try to write some basic unit tests
The tests are currently failing because ConsoleLogger.v1 doesn't exist in spacy-legacy yet
Looks good!