spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Add ConsoleLogger.v2

Open thomashacker opened this issue 3 years ago • 10 comments

Description

This PR adds the next version of theConsoleLogger that saves the training logs to a JSONL file. It saves every row printed in the console as a JSON object. The logger has an additional argument output_file which sets the path to which the log.jsonl is being saved.

Here's an example of a log.jsonl when training an NER component. It provides the full numbers of every score.

{"epoch":0,"step":0,"losses":{"ner":101.4694003456},"scores":{"ents_f":1.8903328932,"ents_p":1.0051736881,"ents_r":15.832363213},"score":0.0189033289}
{"epoch":0,"step":200,"losses":{"ner":10539.4847255106},"scores":{"ents_f":0.2028397566,"ents_p":0.7874015748,"ents_r":0.1164144354},"score":0.0020283976}
{"epoch":0,"step":400,"losses":{"ner":1990.8501044126},"scores":{"ents_f":18.636755824,"ents_p":36.0,"ents_r":12.5727590221},"score":0.1863675582}
{"epoch":1,"step":600,"losses":{"ner":2018.4897573819},"scores":{"ents_f":36.0300250209,"ents_p":63.5294117647,"ents_r":25.1455180442},"score":0.3603002502}

Types of change

New feature and change to documentation

Checklist

  • [x] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [x] I ran the tests, and all new and existing tests passed.
  • [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

thomashacker avatar Jul 26 '22 17:07 thomashacker

The documentation will be adjusted as soon as the code has been approved 😄

thomashacker avatar Jul 26 '22 17:07 thomashacker

Don't we want new loggers to be added to the spacy-loggers package instead?

shadeMe avatar Jul 26 '22 17:07 shadeMe

This is a very very good question!

thomashacker avatar Jul 26 '22 17:07 thomashacker

Not necessarily, especially if it's something short and general-purpose like the console logger.

adrianeboyd avatar Jul 26 '22 18:07 adrianeboyd

Good point/question, but I agree with Adriane that this is generic enough to be in the main library. Like I wouldn't want to add 3 more loggers here, but one for the console and one for file logging makes sense to me. Anything else can go into spacy-loggers.

svlandeg avatar Jul 26 '22 19:07 svlandeg

Is there a way to implement this without duplicating so much code?

adrianeboyd avatar Jul 27 '22 06:07 adrianeboyd

Yes, I think so I'll try to optimize this

thomashacker avatar Jul 27 '22 06:07 thomashacker

I've renamed the FileLogger.v1 to ConsoleLogger.v2 and moved the ConsoleLogger.v1 to spacy-legacy in this PR. I changed the output_file argument so that it sets the full path to the file now instead of the directory.

I also added the print_progress argument which makes printing to the console an optional feature. For now, I've only ensured that the logger writes JSONL files only.

Is there a good way to write unit tests for these kinds of things? I've recreated all possible cases locally, but I think it might be better if we'd have some predefined unit tests.

thomashacker avatar Aug 02 '22 20:08 thomashacker

Is there a good way to write unit tests for these kinds of things? I've recreated all possible cases locally, but I think it might be better if we'd have some predefined unit tests.

That's going to be pretty involved either way, it's difficult to test these kind of console/output/file things in full detail. Some basic tests that the functionality runs and the different options can coexist together might be sufficient at this point, and it's good to hear that you did more thorough tests locally as well.

svlandeg avatar Aug 05 '22 12:08 svlandeg

I'll try to write some basic unit tests

thomashacker avatar Aug 08 '22 11:08 thomashacker

The tests are currently failing because ConsoleLogger.v1 doesn't exist in spacy-legacy yet

thomashacker avatar Aug 12 '22 09:08 thomashacker

Looks good!

adrianeboyd avatar Aug 29 '22 08:08 adrianeboyd