[Feature] Support Arrow Flight Data Retrieval from Inaccessible Nodes
Why I'm doing this:
In deployment scenarios where BE (Backend) nodes are not directly accessible to clients—such as private networks, Kubernetes clusters, or environments with restrictive network policies—clients cannot establish direct Arrow Flight connections to BE nodes. This prevents users from leveraging Arrow Flight SQL's high-performance columnar data transfer capabilities in these common production environments.
Fixes #65359, fixes #63256.
What I'm doing:
This PR implements an Arrow Flight proxy feature where the FE can route Arrow Flight data from BE nodes to clients when direct BE connectivity is unavailable.
Key changes:
-
Proxy configuration via session variables:
arrow_flight_proxy_enabled(default:true): Controls whether proxy mode is enabledarrow_flight_proxy(default: empty): Specifies proxy hostname:port (defaults to current FE)
-
Extended ticket format:
- Direct BE tickets:
<QueryId>:<FragmentInstanceId>(2 parts) - Proxy tickets:
<QueryId>:<FragmentInstanceId>:<BEHost>:<BEPort>(4 parts)
- Direct BE tickets:
-
Proxy implementation:
- FE acts as proxy by creating FlightClient connections to BE nodes
- Streams data from BE to client with proper cancellation handling
- Maintains FlightClient cache
- Automatic cache eviction with proper resource cleanup via removal listener
-
Documentation:
- Added configuration guide with proxy setup examples
- Included usage examples in Python demo code
Design decisions:
- Proxy enabled by default for maximum compatibility out-of-box
- Simple ticket format parsing (split by
:) for backward compatibility - Cache invalidation on errors to prevent stale connections
What type of PR is this:
- [ ] BugFix
- [x] Feature
- [ ] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [x] Parameter changes: default values, similar parameters but with different default values
- [x] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Behavior changes:
- New session variables:
arrow_flight_proxy_enabledandarrow_flight_proxyare now available for configuration - Default proxy mode: Proxy is enabled by default (
arrow_flight_proxy_enabled = true), which routes all Arrow Flight queries through FE.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [x] This pr needs user documentation (for new or modified features or behaviors)
- [x] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [x] 4.0
- [x] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Introduces an FE-based Arrow Flight proxy with session variables to route BE streams via FE, updates ticket formats, adds client caching, and documents usage with tests.
- Arrow Flight SQL (FE Service):
- Add FE proxy mode to forward BE Arrow streams to clients; proxy controlled by session vars
arrow_flight_proxy_enabled(defaulttrue) andarrow_flight_proxy.- Update ticket parsing/routing:
- FE tickets use
token|queryId.- Proxy tickets use
queryId|fragmentInstanceId|beHost|bePort; direct BE remainsqueryId:fragmentInstanceId.- Implement BE FlightClient cache with timed eviction and safe close; invalidate on errors; cancel/close handling; cache cleared on service close.
- Build endpoint/ticket via
parseProxy; validate proxy format; choose FE or custom proxy endpoint, or direct BE.- Session Variables:
- Add
arrow_flight_proxyandarrow_flight_proxy_enabledtoSessionVariablewith defaults, annotations, and getters/setters.- Docs:
- Extend
docs/en/unloading/arrow_flight.mdwith proxy overview, configuration, and Python examples showing the new variables.- Features Listing:
- Add
ArrowFlightSQLtoProductFeaturewith link.- Tests:
- Update/add unit tests for proxy routing, invalid proxy handling, new ticket delimiters, BE proxy streaming, and feature listing assertions.
Written by Cursor Bugbot for commit 259da4e591111369d7ad9f2534695d4532032b7e. This will update automatically on new commits. Configure here.
🧪 CI Insights
Here's what we observed from your CI run for d3cdccad.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
Quality Gate passed
Issues
11 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 120 / 131 (91.60%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/service/arrow/flight/sql/ArrowFlightSqlServiceImpl.java | 111 | 122 | 90.98% | [98, 99, 100, 101, 102, 516, 517, 534, 535, 547, 548] |
| :large_blue_circle: | com/starrocks/qe/SessionVariable.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/feature/ProductFeature.java | 1 | 1 | 100.00% | [] |
[BE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)