squared icon indicating copy to clipboard operation
squared copied to clipboard

`Top n` unhandled exceptions and (if available) code file + line number

Open aaronsteers opened this issue 2 years ago • 3 comments

@pnadolny13 - Related to knowing how to prioritize this item

  • https://github.com/meltano/meltano/issues/7047

and this general theme:

  • https://github.com/meltano/meltano/discussions/6700

are you able to generate a stack-ranked report of the highest frequency exceptions raised by Meltano in telemetry?

I guess it's a two part question:

  1. Do we capture the exception data in telemetry to see how often specific exceptions or errors occur?
    • Line numbers and code files are nice to have, but not strictly necessary. Even without line numbers, we know where "Broken Pipe" errors are coming from.
  2. If yes to the above, do you have time to create the report? Any estimate of level of effort to pull the data?

If yes to 1 but no to 2, then perhaps someone on the engineering team could assist in pulling the data.

Thanks in advance!

aaronsteers avatar Feb 17 '23 19:02 aaronsteers

@aaronsteers it sounds like this might be similar to https://github.com/meltano/squared/issues/492. We do get exception data but its not in a useful form as of today so I cant easily pull this. I'd need to build some dbt models to parse these exceptions/line numbers.

If we wanted an ad hoc query approach I can see that most broken pipes are coming from these as the lowest level in the traceback:

count file line
11947 lib/python3.9/asyncio/streams.py 197
3934 lib/python3.8/asyncio/streams.py 197
494 lib/python3.9/asyncio/unix_events.py 687
385 lib/python3.9/asyncio/streams.py 359
367 lib/python3.10/asyncio/streams.py 344
293 .../unix_events.py 687
286 lib/python3.10/asyncio/streams.py 178
175 lib/python3.8/asyncio/unix_events.py 687
174 .../unix_events.py 698
162 lib/python3.8/asyncio/unix_events.py 665

Is that what youre looking for? With a bit more work I could probably detect the last file/line where the package was meltano vs a dependency.

pnadolny13 avatar Feb 21 '23 17:02 pnadolny13

@pnadolny13 - The above might be sufficient, actually... it looks like great info on top n customer-impacting Meltano failures.

Can you provide context for the above, such as to confirm if these raw counts of the lowest level stack traces? Can you tell what the time period is - for instance all time vs 3 months? And how can we iterate on the above? Is there a SQL query or report that generated the above output?

aaronsteers avatar Feb 21 '23 18:02 aaronsteers

@aaronsteers cool yeah that was just an example to see if thats what you were looking for. That is for BrokenPipeError only, for all time. An example of what the exception traceback looks like in the warehouse is:

[
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 155
  },
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 243
  },
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 280
  },
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 339
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/runner/singer.py",
    "line_number": 219
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/runner/singer.py",
    "line_number": 147
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
    "line_number": 219
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
    "line_number": 188
  },
  {
    "file": "lib/python3.9/asyncio/streams.py",
    "line_number": 359
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
    "line_number": 185
  },
  {
    "file": "lib/python3.9/asyncio/streams.py",
    "line_number": 387
  },
  {
    "file": "lib/python3.9/asyncio/streams.py",
    "line_number": 197
  }

I'm parsing it to get the last line in the array and grouping by the context_uuid which is the execution ID.

WITH base AS (
    SELECT
        context_uuid,
        parse_json(exception):traceback::STRING,
        array_slice(parse_json(exception):traceback::ARRAY, -1, 9999999)[0] AS final_traceback,
        final_traceback:file AS file_name,
        final_traceback:line_number  AS line_number
    FROM PREP.workspace.unstruct_event_flattened
    WHERE parse_json(exception):type::STRING = 'BrokenPipeError'
)
SELECT
    count(DISTINCT context_uuid),
    file_name,
    line_number
FROM base GROUP BY 2, 3;

pnadolny13 avatar Feb 21 '23 21:02 pnadolny13