matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

resubmit pipeline client max connections is too large

Open badboynt1 opened this issue 1 year ago • 3 comments

What type of PR is this?

  • [ ] API-change
  • [ ] BUG
  • [x] Improvement
  • [ ] Documentation
  • [ ] Feature
  • [ ] Test and CI
  • [ ] Code Refactoring

Which issue(s) this PR fixes:

issue #12652

What this PR does / why we need it:

让stream rpc复用tcp连接,减少多cn之间的tcp连接数 可以将配置的最大连接数从10万降低到10.

badboynt1 avatar May 17 '24 04:05 badboynt1

上次被revert是因为checkin regression没有跑过,现在已经实现了shuffle join的阻塞控制,checkin regression已经可以跑过了。 准备多跑几轮checkin regression,然后再多观察观察多cn的tpch,以及长稳测试情况再合并。

badboynt1 avatar May 17 '24 04:05 badboynt1

@badboynt1 Thanks for your contributions!

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the maximum number of connections for the pipeline client is being reduced.

Body:

The body of the pull request provides information on the type of PR, the related issue, and the reason for the changes. It mentions that the change is aimed at reusing TCP connections for stream RPC to reduce the number of TCP connections between multiple CNs. Additionally, it states that the maximum connection number is being reduced from 100,000 to 10.

Changes:

  1. pkg/cnservice/cnclient/client.go:

    • The change in this file modifies the NewStream function to pass false instead of true when creating a new stream, which likely affects the behavior of reusing TCP connections.
    • The constant dfMaxSenderNumber is reduced from 100,000 to 10, which limits the maximum number of senders.
  2. pkg/sql/compile/scope.go:

    • In the notifyAndReceiveFromRemote function, the Close method of streamSender is now called with false instead of true.
  3. pkg/sql/compile/scopeRemoteRunTypes.go:

    • The close method in messageSenderOnClient now calls Close on streamSender with false instead of true.

Feedback and Suggestions:

  1. Security Concern:

    • The change in passing false instead of true in the Close methods across multiple files raises a security concern. It's crucial to ensure that closing connections properly and securely is maintained. Verify that changing this parameter does not introduce any vulnerabilities or unexpected behavior, especially related to resource management and connection handling.
  2. Code Consistency:

    • It's essential to maintain consistency in parameter usage across the codebase. If the parameter true or false signifies specific behavior, ensure it is consistently applied throughout the codebase to avoid confusion and potential bugs.
  3. Documentation:

    • Consider updating the comments or documentation to reflect the changes made in the code. Clear documentation helps developers understand the purpose and behavior of the functions and constants being modified.
  4. Testing:

    • After making these changes, thorough testing is crucial to ensure that the functionality of reusing TCP connections and limiting the maximum number of senders works as expected without any regressions.
  5. Optimization:

    • While reducing the maximum sender number from 100,000 to 10 can be a valid optimization, it's essential to consider the impact on performance and scalability. Monitor the application's behavior under load to ensure the new limit does not adversely affect performance.
  6. Error Handling:

    • Verify that error handling mechanisms are in place for cases where connections fail to close properly. Ensure that any potential errors are appropriately handled to prevent resource leaks or unexpected behavior.
  7. Code Review:

    • Conduct a thorough code review to ensure that all changes align with the project's coding standards and best practices. Address any potential code smells or areas for improvement during the review process.

By addressing the security concern, maintaining code consistency, updating documentation, conducting thorough testing, optimizing performance, ensuring proper error handling, and conducting a comprehensive code review, the pull request can be enhanced to improve the quality and reliability of the codebase.

matrix-meow avatar May 17 '24 04:05 matrix-meow

https://github.com/matrixorigin/ci-test/actions/runs/9121513159 连续跑了三次checkin regression都成功了

badboynt1 avatar May 17 '24 07:05 badboynt1

https://github.com/matrixorigin/ci-test/actions/runs/9185551821 最新的一次也成功了

badboynt1 avatar May 22 '24 06:05 badboynt1