PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

Do not log warnings to stdout

Open apyrgio opened this issue 1 year ago • 1 comments

Description of the bug

What happened?

If one installs the latest version of PyMuPDF, and they initialize a Pixmap object using Python bytes, they will see a log message in the process' stdout:

__init__: using mupdf.python_buffer_data()

(note, this applies only to the new fitz module, not fitz_old)

The underlying cause is that PyMuPDF has the following logging function:

https://github.com/pymupdf/PyMuPDF/blob/0a6f26db6fa1f230f44ab10d2a89515664927419/src/init.py#L31-L37

This function does not employ regular Python logging, but simply writes to stdout. The stdout of a process though is not the place to write warnings, since it may interfere with binary data.

How to reproduce the bug

What I expected to happen?

Initializing a Pixmap object should not write anything to stdout.

How to reproduce it?

  1. Import the fitz_new module
  2. Initialize a Pixmap class using a bytes object for the samples argument (see Pixmap constructor)

Or simply run:

import fitz                                                                                                                                                                                                                    
fitz.Pixmap(fitz.Colorspace(fitz.CS_RGB), 10, 10, b'a' * 100 * 3, False)  

PyMuPDF version

1.23.14

Operating system

Linux

Python version

3.11

apyrgio avatar Feb 06 '24 14:02 apyrgio

Also, one thing I'm not sure about is if this log() invocation is necessary at all, or if it's a dev relic. It's probably best to remove it altogether.

apyrgio avatar Feb 06 '24 19:02 apyrgio

If someone else has stumbled on this issue, note that you can circumvent it with https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout


@julian-smith-artifex-com: Following up on your comment here, since I can no longer comment in the closed PR:

I'll discuss internally about your use of PyMuPDF in a pipeline. If you're doing it, it needs to be supported.

Cool, thanks a lot for taking this use case into consideration. I'll follow up with a new PR in the meantime.

Perhaps we could control where logs/diagnostics go using environment variables, as i can't see a single solution that will address the different concerns that we've been discussing here.

The way that core Python libraries like urllib3 are handling this is via the logging infrastructure. By default nothing is logged, but if the user wants to see log messages, they can enable them via:

requests_log = logging.getLogger("requests.packages.urllib3")                                                                                                                                                                         
requests_log.setLevel(logging.DEBUG)                                                                                                                                                                                                  
requests_log.propagate = True

(this snippet actually targets requests, which uses urllib3 under the hood)

If the user wants even more logging (e.g., request/response bodies), they can further enable them with a switch:

http.client.HTTPConnection.debuglevel = 1

Just sharing those two ways in case they help.

apyrgio avatar Mar 07 '24 12:03 apyrgio

I've just pushed a change to branch main that should allow normal diagnostics ("messages") and debug/dev diagnostics ("logs") to be separately sent to file descriptors or files using environmental variables.

# PYMUPDF_MESSAGE controls the destination of user messages (the `message()`
# function).
#
# PYMUPDF_LOG controls the destination of internal development logging (the
# `log()` function).
#
# Each should be either `fd:<int>` to set to a file descriptor (e.g. `fd:1`
# for stdout, `fd:2` for stderr), `path:<string>` to write to a file or
# `path+:<string>` to append to a file. If not specified, the default is
# stdout.

Most of the time i'd expect only PYMUPDF_MESSAGE to be used. For example, setting it to fd:2 will send all user messages to stderr.

Regarding the logging module, i want to like it because it comes with Python, but i really struggle with its design and complexity; most of it seems to be stuff that i'd never want to use. PyMuPDF has simple messages which have worked for ages and unless we end up having a need for more control, we don't really need logging i think.

That's a great development, thanks for giving us this option Julian. I'll check it out once the new PyMuPDF release is out. In the meantime, feel free to close this issue.

apyrgio avatar Mar 13 '24 18:03 apyrgio

Fixed in 1.24.0.