starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] Support Arrow Flight Data Retrieval from Inaccessible Nodes

Open chris-celerdata opened this issue 3 weeks ago • 10 comments

Why I'm doing this:

In deployment scenarios where BE (Backend) nodes are not directly accessible to clients—such as private networks, Kubernetes clusters, or environments with restrictive network policies—clients cannot establish direct Arrow Flight connections to BE nodes. This prevents users from leveraging Arrow Flight SQL's high-performance columnar data transfer capabilities in these common production environments.

Fixes #65359, fixes #63256.

What I'm doing:

This PR implements an Arrow Flight proxy feature where the FE can route Arrow Flight data from BE nodes to clients when direct BE connectivity is unavailable.

Key changes:

  1. Proxy configuration via session variables:

    • arrow_flight_proxy_enabled (default: true): Controls whether proxy mode is enabled
    • arrow_flight_proxy (default: empty): Specifies proxy hostname:port (defaults to current FE)
  2. Extended ticket format:

    • Direct BE tickets: <QueryId>:<FragmentInstanceId> (2 parts)
    • Proxy tickets: <QueryId>:<FragmentInstanceId>:<BEHost>:<BEPort> (4 parts)
  3. Proxy implementation:

    • FE acts as proxy by creating FlightClient connections to BE nodes
    • Streams data from BE to client with proper cancellation handling
    • Maintains FlightClient cache
    • Automatic cache eviction with proper resource cleanup via removal listener
  4. Documentation:

    • Added configuration guide with proxy setup examples
    • Included usage examples in Python demo code

Design decisions:

  • Proxy enabled by default for maximum compatibility out-of-box
  • Simple ticket format parsing (split by :) for backward compatibility
  • Cache invalidation on errors to prevent stale connections

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [x] Parameter changes: default values, similar parameters but with different default values
  • [x] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Behavior changes:

  • New session variables: arrow_flight_proxy_enabled and arrow_flight_proxy are now available for configuration
  • Default proxy mode: Proxy is enabled by default (arrow_flight_proxy_enabled = true), which routes all Arrow Flight queries through FE.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [x] This pr needs user documentation (for new or modified features or behaviors)
    • [x] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [x] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Introduces an FE-based Arrow Flight proxy with session variables to route BE streams via FE, updates ticket formats, adds client caching, and documents usage with tests.

  • Arrow Flight SQL (FE Service):
    • Add FE proxy mode to forward BE Arrow streams to clients; proxy controlled by session vars arrow_flight_proxy_enabled (default true) and arrow_flight_proxy.
    • Update ticket parsing/routing:
      • FE tickets use token|queryId.
      • Proxy tickets use queryId|fragmentInstanceId|beHost|bePort; direct BE remains queryId:fragmentInstanceId.
    • Implement BE FlightClient cache with timed eviction and safe close; invalidate on errors; cancel/close handling; cache cleared on service close.
    • Build endpoint/ticket via parseProxy; validate proxy format; choose FE or custom proxy endpoint, or direct BE.
  • Session Variables:
    • Add arrow_flight_proxy and arrow_flight_proxy_enabled to SessionVariable with defaults, annotations, and getters/setters.
  • Docs:
    • Extend docs/en/unloading/arrow_flight.md with proxy overview, configuration, and Python examples showing the new variables.
  • Features Listing:
    • Add ArrowFlightSQL to ProductFeature with link.
  • Tests:
    • Update/add unit tests for proxy routing, invalid proxy handling, new ticket delimiters, BE proxy streaming, and feature listing assertions.

Written by Cursor Bugbot for commit 259da4e591111369d7ad9f2534695d4532032b7e. This will update automatically on new commits. Configure here.

chris-celerdata avatar Dec 04 '25 22:12 chris-celerdata

🧪 CI Insights

Here's what we observed from your CI run for d3cdccad.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 04 '25 22:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 04 '25 22:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 06 '25 00:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 02:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 11 '25 22:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 04:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 19:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 13 '25 19:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 15 '25 22:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 17 '25 02:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 19 '25 21:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 19 '25 23:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 22 '25 08:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 120 / 131 (91.60%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/service/arrow/flight/sql/ArrowFlightSqlServiceImpl.java 111 122 90.98% [98, 99, 100, 101, 102, 516, 517, 534, 535, 547, 548]
:large_blue_circle: com/starrocks/qe/SessionVariable.java 8 8 100.00% []
:large_blue_circle: com/starrocks/feature/ProductFeature.java 1 1 100.00% []

github-actions[bot] avatar Dec 22 '25 08:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 22 '25 09:12 github-actions[bot]