AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

failure of test_topk<migraphx::shape::float_type, 1000, 120000> is based on the input data pattern

Open lakhinderwalia opened this issue 2 months ago • 3 comments

Turning into a PR, just to show TopK test failure. Fails in Rocm 7.1

Motivation

Technical Details

Changelog Category

    • [ ] Added: New functionality.
    • [ ] Changed: Changes to existing functionality.
    • [ ] Removed: Functionality or support that has been removed. (Compared to a previous release)
    • [ ] Optimized: Component performance that has been optimized or improved.
    • [ ] Resolved Issues: Known issues from a previous version that have been resolved.
    • [ ] Not Applicable: This PR is not to be included in the changelog.

lakhinderwalia avatar Oct 10 '25 17:10 lakhinderwalia

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4376   +/-   ##
========================================
  Coverage    92.26%   92.26%           
========================================
  Files          560      560           
  Lines        26358    26358           
========================================
  Hits         24319    24319           
  Misses        2039     2039           
Files with missing lines Coverage Δ
src/include/migraphx/generate.hpp 100.00% <100.00%> (ø)
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 10 '25 18:10 codecov[bot]

This can happen when the values are very close with ±1 ULP due to subtle differences in the CPU and GPU. Its not an issue that needs to be fixed.

pfultz2 avatar Oct 13 '25 16:10 pfultz2

This can happen when the values are very close with ±1 ULP due to subtle differences in the CPU and GPU. Its not an issue that needs to be fixed.

This CI pipeline doesn't show Rocm 7.1 failures: failures are now observed for the previously passing tests. These are verify failures between ref and gpu. Example:

[   RUN    ] test_topk<migraphx::shape::float_type, 1029, 80000>
FAILED: gpu
RMS Error: 0.021213
Max diff: 39558
Mismatch at 733: 12362 != 51920```

lakhinderwalia avatar Oct 13 '25 17:10 lakhinderwalia