squared
squared copied to clipboard
`Top n` unhandled exceptions and (if available) code file + line number
@pnadolny13 - Related to knowing how to prioritize this item
- https://github.com/meltano/meltano/issues/7047
and this general theme:
- https://github.com/meltano/meltano/discussions/6700
are you able to generate a stack-ranked report of the highest frequency exceptions raised by Meltano in telemetry?
I guess it's a two part question:
- Do we capture the exception data in telemetry to see how often specific exceptions or errors occur?
- Line numbers and code files are nice to have, but not strictly necessary. Even without line numbers, we know where "Broken Pipe" errors are coming from.
- If yes to the above, do you have time to create the report? Any estimate of level of effort to pull the data?
If yes to 1 but no to 2, then perhaps someone on the engineering team could assist in pulling the data.
Thanks in advance!
@aaronsteers it sounds like this might be similar to https://github.com/meltano/squared/issues/492. We do get exception data but its not in a useful form as of today so I cant easily pull this. I'd need to build some dbt models to parse these exceptions/line numbers.
If we wanted an ad hoc query approach I can see that most broken pipes are coming from these as the lowest level in the traceback:
count | file | line |
---|---|---|
11947 | lib/python3.9/asyncio/streams.py | 197 |
3934 | lib/python3.8/asyncio/streams.py | 197 |
494 | lib/python3.9/asyncio/unix_events.py | 687 |
385 | lib/python3.9/asyncio/streams.py | 359 |
367 | lib/python3.10/asyncio/streams.py | 344 |
293 | .../unix_events.py | 687 |
286 | lib/python3.10/asyncio/streams.py | 178 |
175 | lib/python3.8/asyncio/unix_events.py | 687 |
174 | .../unix_events.py | 698 |
162 | lib/python3.8/asyncio/unix_events.py | 665 |
Is that what youre looking for? With a bit more work I could probably detect the last file/line where the package was meltano vs a dependency.
@pnadolny13 - The above might be sufficient, actually... it looks like great info on top n customer-impacting Meltano failures.
Can you provide context for the above, such as to confirm if these raw counts of the lowest level stack traces? Can you tell what the time period is - for instance all time vs 3 months? And how can we iterate on the above? Is there a SQL query or report that generated the above output?
@aaronsteers cool yeah that was just an example to see if thats what you were looking for. That is for BrokenPipeError
only, for all time. An example of what the exception traceback looks like in the warehouse is:
[
{
"file": "lib/python3.9/site-packages/meltano/cli/elt.py",
"line_number": 155
},
{
"file": "lib/python3.9/site-packages/meltano/cli/elt.py",
"line_number": 243
},
{
"file": "lib/python3.9/site-packages/meltano/cli/elt.py",
"line_number": 280
},
{
"file": "lib/python3.9/site-packages/meltano/cli/elt.py",
"line_number": 339
},
{
"file": "lib/python3.9/site-packages/meltano/core/runner/singer.py",
"line_number": 219
},
{
"file": "lib/python3.9/site-packages/meltano/core/runner/singer.py",
"line_number": 147
},
{
"file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
"line_number": 219
},
{
"file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
"line_number": 188
},
{
"file": "lib/python3.9/asyncio/streams.py",
"line_number": 359
},
{
"file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
"line_number": 185
},
{
"file": "lib/python3.9/asyncio/streams.py",
"line_number": 387
},
{
"file": "lib/python3.9/asyncio/streams.py",
"line_number": 197
}
I'm parsing it to get the last line in the array and grouping by the context_uuid
which is the execution ID.
WITH base AS (
SELECT
context_uuid,
parse_json(exception):traceback::STRING,
array_slice(parse_json(exception):traceback::ARRAY, -1, 9999999)[0] AS final_traceback,
final_traceback:file AS file_name,
final_traceback:line_number AS line_number
FROM PREP.workspace.unstruct_event_flattened
WHERE parse_json(exception):type::STRING = 'BrokenPipeError'
)
SELECT
count(DISTINCT context_uuid),
file_name,
line_number
FROM base GROUP BY 2, 3;