lttng-analyses
lttng-analyses copied to clipboard
Crash during statedump parsing
$ ~/projects/src/lttng/lttng-analyses/lttng-iousagetop --skip-validation --tid 2873 --debug lttng-startup-1551/
Traceback (most recent call last):
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/cli/command.py", line 73, in _run_step
fn()
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/cli/command.py", line 341, in _run_analysis
self._automaton.process_event(event)
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/automaton.py", line 75, in process_event
sp.process_event(ev)
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/sp.py", line 33, in process_event
self._cbs[name](ev)
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/statedump.py", line 102, in _process_lttng_statedump_file_descriptor
cpu_id=event['cpu_id'])
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/automaton.py", line 56, in send_notification_cb
cb(**kwargs)
File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/core/io.py", line 314, in _process_update_fd
fd_list = self.tids[tid].fds[fd]
KeyError: 1662
Error: Cannot run analysis: 1662
The error at the end is wrong, it has nothing to do with an analysis, 1662 is a FD. It looks like we are trying to update the filename of an FD that is not in the analysis state.
We cannot have access to the trace that triggers this problem.
I use this patch locally to make the scripts work for me:
diff --git a/lttnganalyses/core/io.py b/lttnganalyses/core/io.py
index b33a41d..42c72a5 100644
--- a/lttnganalyses/core/io.py
+++ b/lttnganalyses/core/io.py
@@ -311,8 +311,11 @@ class IoAnalysis(Analysis):
fd = kwargs['fd']
new_filename = parent_proc.fds[fd].filename
- fd_list = self.tids[tid].fds[fd]
- fd_list[-1].filename = new_filename
+ try:
+ fd_list = self.tids[tid].fds[fd]
+ fd_list[-1].filename = new_filename
+ except:
+ pass
class DiskStats():
So the issue appears to stem from the way the --tid
filter list is handled. The automaton (the state system) is unaware of the list, and will therefore keep track of all file descriptors for all TIDs, regardless of what the analysis filters. That's why it would still send a notification to update the filename tied to this FD for this process, despite it being filtered out by the analysis.
The solution is to perform a _filter_process
check in _process_update_fd
. I'll submit a patch with this change.
However, the error message generated is very unhelpful and quite confusing. That's because we only print the name of the step (in this case, "run analysis") during which the exception occurred, and then the exception's message, which, in the case of a KeyError
like here, is only the value of the key.
I'm not sure how to go about solving this. There are definitely instances where exception messages will be helpful to the user (mostly ones we generate ourselves), so we need to print them, but here it's more confusing than anything else.
Obviously exceptions that we don't raise intentionally shouldn't happen ever, ideally, but that's pretty much unavoidable.
Thoughts?
Please wait before working around here, we have a big patchset coming in that changes a lot of code in the analysis to handle multiple concurrent periods (branch overlap_periods in my personal repo if you want to have a look).
The filter by tid/cpu has been removed from the I/O analysis because it only make sense to filter in the CLI, we need all the data in the analysis. As soon as the patchset is merged we'll look if this is still an issue.
Thanks,
Julien