lttng-analyses Crash during statedump parsing

$ ~/projects/src/lttng/lttng-analyses/lttng-iousagetop --skip-validation --tid 2873 --debug lttng-startup-1551/
Traceback (most recent call last):      
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/cli/command.py", line 73, in _run_step
    fn()
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/cli/command.py", line 341, in _run_analysis
    self._automaton.process_event(event)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/automaton.py", line 75, in process_event
    sp.process_event(ev)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/sp.py", line 33, in process_event
    self._cbs[name](ev)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/statedump.py", line 102, in _process_lttng_statedump_file_descriptor
    cpu_id=event['cpu_id'])
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/linuxautomaton/automaton.py", line 56, in send_notification_cb
    cb(**kwargs)
  File "/ssd/milian/projects/src/lttng/lttng-analyses/lttnganalyses/core/io.py", line 314, in _process_update_fd
    fd_list = self.tids[tid].fds[fd]
KeyError: 1662
Error: Cannot run analysis: 1662

The error at the end is wrong, it has nothing to do with an analysis, 1662 is a FD. It looks like we are trying to update the filename of an FD that is not in the analysis state.

We cannot have access to the trace that triggers this problem.

Jul 07 '16 18:07 jdesfossez

I use this patch locally to make the scripts work for me:

diff --git a/lttnganalyses/core/io.py b/lttnganalyses/core/io.py
index b33a41d..42c72a5 100644
--- a/lttnganalyses/core/io.py
+++ b/lttnganalyses/core/io.py
@@ -311,8 +311,11 @@ class IoAnalysis(Analysis):
         fd = kwargs['fd']

         new_filename = parent_proc.fds[fd].filename
-        fd_list = self.tids[tid].fds[fd]
-        fd_list[-1].filename = new_filename
+        try:
+            fd_list = self.tids[tid].fds[fd]
+            fd_list[-1].filename = new_filename
+        except:
+            pass


 class DiskStats():

Jul 11 '16 14:07 milianw

So the issue appears to stem from the way the --tid filter list is handled. The automaton (the state system) is unaware of the list, and will therefore keep track of all file descriptors for all TIDs, regardless of what the analysis filters. That's why it would still send a notification to update the filename tied to this FD for this process, despite it being filtered out by the analysis.

The solution is to perform a _filter_process check in _process_update_fd. I'll submit a patch with this change.

However, the error message generated is very unhelpful and quite confusing. That's because we only print the name of the step (in this case, "run analysis") during which the exception occurred, and then the exception's message, which, in the case of a KeyError like here, is only the value of the key.

I'm not sure how to go about solving this. There are definitely instances where exception messages will be helpful to the user (mostly ones we generate ourselves), so we need to print them, but here it's more confusing than anything else.

Obviously exceptions that we don't raise intentionally shouldn't happen ever, ideally, but that's pretty much unavoidable.

Thoughts?

Jul 24 '16 00:07 abusque

Please wait before working around here, we have a big patchset coming in that changes a lot of code in the analysis to handle multiple concurrent periods (branch overlap_periods in my personal repo if you want to have a look).

The filter by tid/cpu has been removed from the I/O analysis because it only make sense to filter in the CLI, we need all the data in the analysis. As soon as the patchset is merged we'll look if this is still an issue.

Thanks,

Julien

Jul 24 '16 00:07 jdesfossez

lttng-analyses lttng-analyses copied to clipboard

Crash during statedump parsing

lttng-analyses
lttng-analyses copied to clipboard