awx icon indicating copy to clipboard operation
awx copied to clipboard

OSError: [Errno 90] Message too long

Open anxstj opened this issue 5 years ago • 11 comments

ISSUE TYPE
  • Bug Report
SUMMARY

The awx_task container logs a traceback:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

I stumbled on it while examining the logs and found it worth reporting.

ENVIRONMENT
  • AWX version: 11.2.0
  • AWX install method:docker
  • Ansible version: 2.9.7
  • Operating System: Debian 10
  • Web Browser: Firefox 75
STEPS TO REPRODUCE

Actually I don't know how to reproduce this.

EXPECTED RESULTS

No traceback.

ACTUAL RESULTS

A traceback is logged.

ADDITIONAL INFORMATION
2020-06-08 06:00:24,531 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 24, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 54, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 219, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 231, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 69, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 71, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 184, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 154, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 415, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 373, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 22, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()
2020-06-08 06:00:25,901 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
2020-06-08 06:00:25,901 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 24, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 54, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 219, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 231, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 69, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 71, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 184, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 154, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 415, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 373, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 22, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 24, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 54, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 219, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 231, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 69, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 71, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 184, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 154, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 415, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 373, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
2020-06-08 06:00:26,150 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
2020-06-08 06:00:26,150 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 22, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()
2020-06-08 06:00:26,156 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
2020-06-08 06:00:26,156 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(2)
2020-06-08 06:00:26,174 DEBUG    awx.main.commands.run_callback_receiver JobEvent.objects.bulk_create(1)

anxstj avatar Jun 08 '20 06:06 anxstj

I'm betting this is due to our usage of a local domain socket and max packet size for SOCK_DGRAM. Probably you've got events periodically that are really large (fact data, perhaps).

ryanpetrello avatar Jun 08 '20 16:06 ryanpetrello

One approach here is to just use TCP, but we didn't do that because we didn't want to have to deal with connection security.

Maybe we could detect this type of situation and automatically attempt to truncate the message? Though, I could see that upsetting external log aggregators that relied on e.g., valid JSON. Or perhaps we could chunk the message ourselves.

ryanpetrello avatar Jun 08 '20 16:06 ryanpetrello

We are also getting these stacktraces and error messages in our logs. I'm worried that some logging information and job events are being dropped because of this. We receive this message when syncing our shared ansible repo project which involves a full delete/clone of the git repo and many many files.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 936, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/handlers.py", line 940, in emit
    self.socket.send(msg)
OSError: [Errno 90] Message too long
Call stack:
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 154, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_callback_receiver.py", line 32, in handle
    queues=[getattr(settings, 'CALLBACK_QUEUE', '')],
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 59, in __init__
    self.pool.init_workers(self.worker.work_loop)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 241, in init_workers
    self.up()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 253, in up
    worker.start()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/pool.py", line 78, in start
    self.process.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 99, in work_loop
    return super(CallbackBrokerWorker, self).work_loop(*args, **kw)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/base.py", line 191, in work_loop
    self.perform_work(body, *args)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 178, in perform_work
    event = cls.create_from_data(**body)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 437, in create_from_data
    event._update_from_event_data()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/events.py", line 392, in _update_from_event_data
    extra=dict(python_objects=dict(job_event=self))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/utils/handlers.py", line 30, in emit
    return super(RSysLogHandler, self).emit(msg)
Message: 'Event data saved.'
Arguments: ()

I'm dubious that "Event data saved" actually happened 😬

hjkatz avatar Jan 26 '21 15:01 hjkatz

@hjkatz on a whim - what happens if you set MAX_EVENT_RES=60000 in AWX_TASK_ENV (or MAX_EVENT_RES_DATA in the deployment settings)

wenottingham avatar Apr 21 '21 19:04 wenottingham

Maybe should write the JobEvent logs to a file and configure RSysLog to send them to the log aggregator? I think this should fix the problem. Or is this a bad idea?

oweel avatar Jul 22 '21 09:07 oweel

Was there ever a solution found for this? I'm seeing it in v15 and log data is definitely being dropped.

dommoly avatar Dec 13 '21 11:12 dommoly

setting this as a workaround helped in my case: https://github.com/ansible/awx-operator#no-log

pfeifferj avatar Jul 04 '23 11:07 pfeifferj

Hi @pfeifferj What's the workaround that has helped you? I can't see anything specific in that link.

setting this as a workaround helped in my case: https://github.com/ansible/awx-operator#no-log

lals1 avatar Sep 29 '23 08:09 lals1

@lals1 I vaguely remember that the playbooks generated a lot of logs, which filled up /var/log/ and led to AAP execution pods dying. setting the no-log option in the operator helped. iirc, there was also a bug in OpenShift/kubernetes -- which was fixed in ocp 4.10.16. this bug made the logging issue worse with the AAP/AWX operator. scheduling the AAP/AWX operator workloads on its dedicated nodes also helped.

hope this helps!

pfeifferj avatar Sep 29 '23 09:09 pfeifferj

Thanks for the response @pfeifferj

In my case, this issue started when I upgraded AWX version from 21.0.0 to 23.2.0. It was working fine earlier. We use an external log aggregator with type logstash for pushing only AWX-related logs.

Just wondering what has changed in between those AWX versions that rsyslog is now complaining about message too long 🤔

lals1 avatar Sep 29 '23 09:09 lals1

This is still an issue using AWX 24.6.1, preventing us from shipping package facts to Splunk -- does anyone have a solution?

rev138 avatar Sep 09 '24 14:09 rev138