papermill
papermill copied to clipboard
Fix log output
Fixes issue where \n characters are randomly inserted into the console for stdout messages. This results in non-deterministic output breaking any downstream process.
Connected to #433, #496 Closes #765
Current Behaviour:
When there is significant execution time or async activity between stdout/stderr calls in an output cell, the default logger inserts newline terminators to the papermill console output. This output is different than output as executed in a live notebook client. (See below for screenshots)
Example 1:
Notebook Output:
Console Output:
=============
Example 2:
Notebook Output:
Console Output:
=============
Example 3:
Notebook Output:
Console Output:
Expected Behaviour:
When using the --log-output option, especially when using the cli, the cell output text from papermill should match the text output when running a notebook in jupyter (or other .ipynb client)
Fix:
Added a secondary logger notebook_logger to log.py that removes the extraneous terminator from the stream handlers. The notebook logger is then only used by cell outputs, leaving the default papermill logger to handle all papermill messages.
Users can override this behaviour when using papermill as a library by either setting the log.notebook_logger to the default log.loggeror modifying the parameters when instantiating a Papermill client.
Output with fix:
Reasoning behind Fix
The papermill client was using the default log.info of the default notebook client for all stdout messages from a notebook when the explicit —log-output option is used. This calls the default logger which is built as a logger for syslog. As a result, the log formatter will automatically add \n when flushing.
In Jupyter, the input cell’s stdout and stderr calls are captured and redirected to the notebook’s output cells without additional \n characters added in order to preserve the intended output formatting of the notebook’s author. If the author needs to modify their output to ensure it is compatible with whatever downstream tool/process, they should be responsible for making those changes in the notebook itself and trusting that papermill will not alter the cell’s output.
Unfortunately this does not fix the same issue in the underlying nbclient. As a result, the cell output in the output notebook is still different than if you were to run the notebook in a live system (additional newline characters.)
Excerpt from python logging that is causing the issue:
class StreamHandler(Handler):
…
terminator = '\n'
…
def emit(self, record):
"""
Emit a record.
If a formatter is specified, it is used to format the record.
The record is then written to the stream with a trailing newline. If
exception information is present, it is formatted using
traceback.print_exception and appended to the stream. If the stream
has an 'encoding' attribute, it is used to determine how to do the
output to the stream.
"""
try:
msg = self.format(record)
stream = self.stream
# issue 35046: merged two stream.writes into one.
stream.write(msg + self.terminator) # <-- `\n` terminator is added to every call of log._log
self.flush()
Codecov Report
Attention: Patch coverage is 66.66667% with 9 lines in your changes missing coverage. Please review.
Project coverage is 90.87%. Comparing base (
cb2eb37) to head (2ec6897). Report is 13 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #810 +/- ##
==========================================
- Coverage 91.54% 90.87% -0.68%
==========================================
Files 17 17
Lines 1621 1654 +33
==========================================
+ Hits 1484 1503 +19
- Misses 137 151 +14