beaker icon indicating copy to clipboard operation
beaker copied to clipboard

[BUG] bkr job-results ends with an ISE 500 traceback for jobs with too many results

Open zelial opened this issue 4 years ago • 1 comments

Describe the bug
When a job contains a lot of reported results (7500), it can't be retrieved via cmdline, bkr job-results returns a non-descriptive ISE 500 traceback:

Traceback (most recent call last): File "/usr/bin/bkr", line 11, in load_entry_point('beaker-client==28.2', 'console_scripts', 'bkr')() File "/usr/lib/python3.6/site-packages/bkr/client/main.py", line 113, in main return cmd.run(*cmd_args, **cmd_opts.dict) File "/usr/lib/python3.6/site-packages/bkr/client/commands/cmd_job_results.py", line 124, in run myxml = self.hub.taskactions.to_xml(task, False, True, include_logs) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in call return self.__send(self.__name, args) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request verbose=self.__verbose File "/usr/lib/python3.6/site-packages/bkr/common/xmlrpc3.py", line 470, in request result = transport_class.request(self, *args, **kwargs) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib/python3.6/site-packages/bkr/common/xmlrpc3.py", line 405, in single_request response.msg) xmlrpc.client.ProtocolError: <ProtocolError for beaker.engineering.redhat.com/client/: 500 Internal error>

Version-Release number
Beaker 28.2

To Reproduce
Schedule a job that reports more than 7500 results

Actual behavior
ISE 500 traceback from bkr client, WebUI loads, albeit slowly.

Expected behavior
Beaker should give users a descriptive error instructing them what went wrong and how to fix it (not an ISE 500), do so consistently (same behavior when using webui and api) and ideally do it in a timely manner and abort cleanly when limits are reached.

zelial avatar Jul 14 '21 08:07 zelial

Yeah, I know the root cause here. The beaker server died when it tried to fetch all the data to memory. Workers killed by the system and resulting in 500. I will fix this when I rework API.

StykMartin avatar Jul 14 '21 09:07 StykMartin