libxlio icon indicating copy to clipboard operation
libxlio copied to clipboard

issue: 4409248 Fix RoCE LAG warning

Open BasharRadya opened this issue 8 months ago • 13 comments

User description

We cannot disable RoCE LAG (deprecated option in OFED 5.1).

Change type

What kind of change does this PR introduce?

  • [ ] Bugfix
  • [ ] Feature
  • [ ] Code style update
  • [ ] Refactoring (no functional changes, no api changes)
  • [ ] Build related changes
  • [ ] CI related changes
  • [ ] Documentation content changes
  • [ ] Tests
  • [ ] Other

Check list

  • [ ] Code follows the style de facto guidelines of this project
  • [ ] Comments have been inserted in hard to understand places
  • [ ] Documentation has been updated (if necessary)
  • [ ] Test has been added (if possible)

PR Type

Bug fix, Enhancement


Description

  • Removed deprecated RoCE LAG disable path references.

  • Simplified print_roce_lag_warnings function signature and logic.

  • Updated RoCE LAG warning messages for clarity and relevance.

  • Adjusted related function calls to match updated signature.


Changes walkthrough 📝

Relevant files
Enhancement
net_device_val.cpp
Adjusted RoCE LAG warning logic and function calls             

src/core/dev/net_device_val.cpp

  • Removed disable_path parameter from print_roce_lag_warnings calls.
  • Adjusted function calls to match updated signature.
  • Simplified logic for RoCE LAG warning handling.
  • +2/-3     
    utils.cpp
    Simplified `print_roce_lag_warnings` and updated messages

    src/core/util/utils.cpp

  • Removed disable_path parameter from print_roce_lag_warnings.
  • Updated warning messages for clarity and removed deprecated
    instructions.
  • Simplified logic in the warning function.
  • +3/-10   
    utils.h
    Updated `print_roce_lag_warnings` function signature         

    src/core/util/utils.h

  • Updated print_roce_lag_warnings function signature.
  • Removed disable_path parameter from declaration.
  • +2/-2     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • BasharRadya avatar Apr 20 '25 08:04 BasharRadya

    bot:retest

    BasharRadya avatar Apr 20 '25 10:04 BasharRadya

    bot:retest

    BasharRadya avatar Apr 21 '25 06:04 BasharRadya

    bot:retest

    AlexanderGrissik avatar Apr 21 '25 08:04 AlexanderGrissik

    Could you please update the description to detail more about the "why"?

    tomerdbz avatar Apr 30 '25 07:04 tomerdbz

    /review

    tomerdbz avatar Apr 30 '25 07:04 tomerdbz

    /describe

    tomerdbz avatar Apr 30 '25 07:04 tomerdbz

    /improve

    tomerdbz avatar Apr 30 '25 07:04 tomerdbz

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Issue

    The function print_roce_lag_warnings is invoked without the bond_roce_lag_path argument in some cases, which may lead to incorrect behavior or missing information in the warning logs.

    print_roce_lag_warnings(get_ifname_link(), guid_iter->second.front().c_str(),
                            guid_iter->second.back().c_str());
    
    Incomplete Warning Message

    The updated print_roce_lag_warnings function no longer includes a message about disabling RoCE LAG or referencing the release notes, which might reduce clarity for users troubleshooting the issue.

    "* " PRODUCT_NAME " cannot offload the device; issue with RoCE LAG.\n");
    

    pr-review-bot-app[bot] avatar Apr 30 '25 07:04 pr-review-bot-app[bot]

    PR Description updated to latest commit (https://github.com/Mellanox/libxlio/commit/d6567d8866ba21553600b46ba1df1ff6c479050d)

    pr-review-bot-app[bot] avatar Apr 30 '25 07:04 pr-review-bot-app[bot]

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Add bounds check for safe access

    Verify that guid_iter->second contains at least two elements before accessing
    front() and back() to avoid potential out-of-bounds access.

    src/core/dev/net_device_val.cpp [1482-1483]

    -print_roce_lag_warnings(get_ifname_link(), guid_iter->second.front().c_str(),
    -                        guid_iter->second.back().c_str());
    +if (guid_iter->second.size() >= 2) {
    +    print_roce_lag_warnings(get_ifname_link(), guid_iter->second.front().c_str(),
    +                            guid_iter->second.back().c_str());
    +}
    
    Suggestion importance[1-10]: 9

    __

    Why: Adding a bounds check ensures that the code does not attempt to access elements of guid_iter->second without verifying their existence, preventing potential out-of-bounds errors. This is a critical improvement for code safety and correctness.

    High
    Validate pointer before function call

    Ensure that get_ifname_link() does not return a null or invalid pointer before
    passing it to print_roce_lag_warnings to prevent undefined behavior.

    src/core/dev/net_device_val.cpp [1534]

    -print_roce_lag_warnings(get_ifname_link());
    +const char* ifname_link = get_ifname_link();
    +if (ifname_link) {
    +    print_roce_lag_warnings(ifname_link);
    +}
    
    Suggestion importance[1-10]: 8

    __

    Why: Validating the pointer returned by get_ifname_link() before using it in print_roce_lag_warnings prevents undefined behavior in case of a null or invalid pointer. This is an important safeguard for robust code execution.

    Medium
    General
    Improve clarity of warning message

    Clarify the warning message to provide more actionable information about the issue
    with RoCE LAG for better user understanding.

    src/core/util/utils.cpp [183]

    -vlog_printf(VLOG_WARNING, "* " PRODUCT_NAME " cannot offload the device; issue with RoCE LAG.\n");
    +vlog_printf(VLOG_WARNING, "* " PRODUCT_NAME " cannot offload the device due to an issue with RoCE LAG configuration. Please check the configuration and retry.\n");
    
    Suggestion importance[1-10]: 6

    __

    Why: Enhancing the warning message with more actionable information improves user understanding and usability. While this change is not critical, it contributes to better communication of issues to the user.

    Low

    pr-review-bot-app[bot] avatar Apr 30 '25 07:04 pr-review-bot-app[bot]

    bot:retest

    dpressle avatar May 15 '25 07:05 dpressle

    @tomerdbz please review in your convenient time

    BasharRadya avatar May 21 '25 08:05 BasharRadya

    bot:retest

    BasharRadya avatar May 21 '25 08:05 BasharRadya

    @tomerdbz REMINDER - i think you approved it in VMA but not in XLIO?

    BasharRadya avatar Jul 13 '25 09:07 BasharRadya