prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Suppress "PID ..." log lines?

Open sckott opened this issue 1 year ago • 16 comments

Expectation / Proposal

We're using the ShellOperation class from this library for all of our Prefect tasks. It works great. However, some of our team would like to suppress the log lines like PID <some pid> stream output:. Is there any way to suppress those lines in prefect logs? See screenshot below.

Perhaps there’s an env var that can be set? Perhaps it can be achieved via logging config?

If this is the wrong place for this ticket let me know!

Traceback / Example

Screen Shot 2023-03-01 at 9 48 40 AM

sckott avatar Mar 01 '23 19:03 sckott

@sckott do you just mean toggling stream_output to False? https://github.com/PrefectHQ/prefect-shell/blob/main/prefect_shell/commands.py#L234

zanieb avatar Mar 01 '23 22:03 zanieb

Thanks @madkinsz - That does suppress the "PID ..." lines, but also all logs I think from the task. Here's I think a reproducible example:

test.py

from prefect import flow, task
from prefect_shell import ShellOperation


@task
def my_task(x, stream=True):
    return ShellOperation(
        commands=[
            f"echo the number is {x}",
            f"echo I have {x} apples",
        ],
        stream_output=stream
    ).run()


@flow(name="Test")
def test():
    a = my_task(x=5, stream=True)
    b = my_task(x=7, stream=False)

    return a, b


if __name__ == "__main__":
    test()

Running python3.11 -m test gives me:

08:59:12.165 | INFO    | prefect.engine - Created flow run 'ultra-crab' for flow 'Test'
08:59:12.288 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-0' for task 'my_task'
08:59:12.289 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-0' immediately...
08:59:12.310 | INFO    | Task run 'my_task-0' - PID 92133 triggered with 2 commands running inside the '.' directory.
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 stream output:
the number is 5
I have 5 apples
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 completed with return code 0.
08:59:12.324 | INFO    | Task run 'my_task-0' - Finished in state Completed()
08:59:12.337 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-1' for task 'my_task'
08:59:12.338 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-1' immediately...
08:59:12.356 | INFO    | Task run 'my_task-1' - PID 92134 triggered with 2 commands running inside the '.' directory.
08:59:12.358 | INFO    | Task run 'my_task-1' - PID 92134 completed with return code 0.
08:59:12.368 | INFO    | Task run 'my_task-1' - Finished in state Completed()
08:59:12.377 | INFO    | Flow run 'ultra-crab' - Finished in state Completed()

The behavior I'm looking for is to see the logs from the echo statements still, like:

08:59:12.165 | INFO    | prefect.engine - Created flow run 'ultra-crab' for flow 'Test'
08:59:12.288 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-0' for task 'my_task'
08:59:12.289 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-0' immediately...
08:59:12.310 | INFO    | Task run 'my_task-0' - PID 92133 triggered with 2 commands running inside the '.' directory.
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 stream output:
the number is 5
I have 5 apples
08:59:12.313 | INFO    | Task run 'my_task-0' - PID 92133 completed with return code 0.
08:59:12.324 | INFO    | Task run 'my_task-0' - Finished in state Completed()
08:59:12.337 | INFO    | Flow run 'ultra-crab' - Created task run 'my_task-1' for task 'my_task'
08:59:12.338 | INFO    | Flow run 'ultra-crab' - Executing 'my_task-1' immediately...
08:59:12.356 | INFO    | Task run 'my_task-1' - PID 92134 triggered with 2 commands running inside the '.' directory.
the number is 7
I have 7 apples
08:59:12.358 | INFO    | Task run 'my_task-1' - PID 92134 completed with return code 0.
08:59:12.368 | INFO    | Task run 'my_task-1' - Finished in state Completed()
08:59:12.377 | INFO    | Flow run 'ultra-crab' - Finished in state Completed()

sckott avatar Mar 02 '23 17:03 sckott

I see so you're just interested in dropping the "PID ... stream output:" prefix from the logs?

zanieb avatar Mar 02 '23 17:03 zanieb

Yes, just to make the logs less "noisy". Perhaps it can be done via the logging config file?

sckott avatar Mar 02 '23 17:03 sckott

It's hard-coded at https://github.com/PrefectHQ/prefect-shell/blob/main/prefect_shell/commands.py#L160

I'd review a pull request making that prefix optional. Perhaps stream_output_include_pid: bool = True. Another option is to get a child logger for each stream output operation that includes the PID e.g.

from logging import getLogger

logger = getLogger("test")
pid = 123
child = logger.getChild(str(pid))
print(child.name)

I think that would be a cleaner solution; this kind of information probably doesn't belong in the message itself?

zanieb avatar Mar 02 '23 17:03 zanieb

I don't follow what you mean by the child logger. Is the idea there to make a custom logger which then we can strip the pid line from?

sckott avatar Mar 02 '23 17:03 sckott

@sckott if we use a child logger with the PID in the name (in the code I linked) we can drop the prefix entirely as the information would be encoded in the logger name.

zanieb avatar Mar 02 '23 19:03 zanieb

Are you proposing this as something a user like me can do in my own code, or a change one could make to prefect-shell?

So whatever change you are suggesting, what would some example logs look like after the change?

sckott avatar Mar 03 '23 17:03 sckott

A change in prefect-shell. The logs would just include the output from the command. The pid would be included in the logger name.

zanieb avatar Mar 03 '23 19:03 zanieb

So then the PID would not appear in the logs, but would be available via getLogger?

Perhaps this should be toggleable, though I realize that's more complexity to maintain. I think it's likely to be useful to have the PID in the logs for debugging, though I haven't used it that way yet. My worklow thus far has been finding the docker container or process by the name of the scripts that we call via ShellOperation, but it would be faster to know the exact PID

sckott avatar Mar 03 '23 21:03 sckott

The PID would appear in local logs, ~but the logger name is not exposed in the UI at this time~ we're adding logger names in the UI in https://github.com/PrefectHQ/prefect-ui-library/pull/1203

zanieb avatar Mar 03 '23 21:03 zanieb

Should go out in the next release @madkinsz @sckott (sooner for Cloud folks)

znicholasbrown avatar Mar 03 '23 22:03 znicholasbrown

Hey @sckott! I've implemented using a child logger to track PIDs per @madkinsz's suggestion in PrefectHQ/prefect-shell#77. Can you install that branch and run some flows to see if it resolves your issue?

desertaxle avatar Mar 06 '23 14:03 desertaxle

Thanks @desertaxle !

Can you install that branch and run some flows to see if it resolves your issue?

I assume I'll only be able to control what I see in local logs, yes? Just checking.

sckott avatar Mar 06 '23 16:03 sckott

There isn't any extra control, but the process PID has been moved to a child logger. You should be able to still see the PID without it cluttering your log output.

desertaxle avatar Mar 06 '23 16:03 desertaxle

Using the example task/flow file above https://github.com/PrefectHQ/prefect/issues/13079

pip install git+https://github.com/PrefectHQ/prefect-shell.git@pid-child-logger
python3.11 -m test2
File "/opt/homebrew/lib/python3.11/site-packages/prefect_shell/commands.py", line 402, in run
   pid_logger = self.logger.getChild(str(process.pid))

AttributeError: 'PrefectLogAdapter' object has no attribute 'getChild'

Go back to the version of prefect-shell on pypi

pip uninstall prefect-shell
pip install prefect-shell
python3.11 -m test2
10:25:56.785 | INFO    | prefect.engine - Created flow run 'diamond-asp' for flow 'Test'
10:25:56.907 | INFO    | Flow run 'diamond-asp' - Created task run 'my_task-0' for task 'my_task'
10:25:56.908 | INFO    | Flow run 'diamond-asp' - Executing 'my_task-0' immediately...
10:25:56.928 | INFO    | Task run 'my_task-0' - PID 27982 triggered with 2 commands running inside the '.' directory.
10:25:56.932 | INFO    | Task run 'my_task-0' - PID 27982 stream output:
the number is 5
I have 5 apples
10:25:56.933 | INFO    | Task run 'my_task-0' - PID 27982 completed with return code 0.
10:25:56.947 | INFO    | Task run 'my_task-0' - Finished in state Completed()
10:25:56.959 | INFO    | Flow run 'diamond-asp' - Created task run 'my_task-1' for task 'my_task'
10:25:56.959 | INFO    | Flow run 'diamond-asp' - Executing 'my_task-1' immediately...
10:25:56.980 | INFO    | Task run 'my_task-1' - PID 27983 triggered with 2 commands running inside the '.' directory.
10:25:56.981 | INFO    | Task run 'my_task-1' - PID 27983 completed with return code 0.
10:25:56.992 | INFO    | Task run 'my_task-1' - Finished in state Completed()
10:25:57.002 | INFO    | Flow run 'diamond-asp' - Finished in state Completed()

I'm running prefect v2.8.3

sckott avatar Mar 06 '23 18:03 sckott