treeherder icon indicating copy to clipboard operation
treeherder copied to clipboard

If no FailureLines exist for a failure, fallback to TestLogFailure

Open camd opened this issue 2 months ago • 2 comments

An experiment to see how this looks.

Here were the options Claude proposed. I went with Option 1 in this PR:

Option 1: Make Push Health use TextLogError as a fallback ✅ RECOMMENDED

  • Modify treeherder/push_health/tests.py:322-341 to query TextLogError when no FailureLine objects exist
  • Minimal changes, uses existing data
  • Pros: Quick fix, uses data that's already there
  • Cons: Mixing two data sources; TextLogError doesn't have structured test data like FailureLine

Option 2: Investigate and fix FailureLine generation

  • Determine if timeout failures create FailureLine objects
  • If yes, what action do they have? Add that action to Push Health's query
  • If no, fix the log parser to create FailureLine objects for timeouts
  • Pros: Fixes the root cause
  • Cons: Requires understanding the Mozilla test harness structured log format; may require changes upstream

Option 3: Create FailureLine objects from TextLogError

  • When TextLogError objects exist but no matching FailureLine, create synthetic FailureLine objects
  • Pros: Unifies the data sources
  • Cons: More complex; adds processing overhead

My Recommendation

Start with Option 1 (fallback to TextLogError) because:

  1. It's the fastest solution
  2. The data is already available and working in the Summary Panel
  3. It's a minimal code change to treeherder/push_health/tests.py
  4. You can later implement Option 2 as a proper fix

camd avatar Nov 18 '25 18:11 camd

Codecov Report

:x: Patch coverage is 18.07229% with 68 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 80.10%. Comparing base (75d47a7) to head (8cf3aa9).

Files with missing lines Patch % Lines
treeherder/push_health/tests.py 18.07% 68 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9079      +/-   ##
==========================================
- Coverage   80.26%   80.10%   -0.16%     
==========================================
  Files         596      596              
  Lines       32182    32265      +83     
  Branches     3276     3269       -7     
==========================================
+ Hits        25830    25845      +15     
- Misses       6184     6252      +68     
  Partials      168      168              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Nov 18 '25 18:11 codecov-commenter

it seems that this is solving for the case where we are pending log parsing- but that is a normal thing, yet probably not so misleading. If you think this will make push health more responsive and accurate (even the perception), then let me know

This is a really good point. I added logic to check if the job's log parsing is still pending before it tries to fall back to the TextLogErrors.

camd avatar Nov 19 '25 23:11 camd