[Feature] URL Function Implementation
Issue: #66227
Related PR: #66206
Status: Developing on SSRF (Named Parameter done)
Authors: @EdwardArchive, with review from @alvin-celerdata
What I'm doing:
Executive Summary
This PR adds an http_request() scalar function to StarRocks that enables executing HTTP/HTTPS requests directly from SQL queries. The function allows users to send webhook notifications, integrate with external REST APIs for data enrichment, and enable real-time event-driven workflows from within SQL.
Key Design Decisions:
| Decision | Outcome |
|---|---|
| Function Name | http_request (clearer than url) |
| Parameter Style | Named arguments (requires enhancement for non-table functions) |
| Security Model | Allowlist-based with private network blocking by default |
| Future Enhancement | CONNECTION object for credential management and reusability |
Background & Motivation
Modern data platforms increasingly require integration with external services for alerting, enrichment, and workflow automation. Currently, StarRocks users must implement these integrations outside the database layer, adding complexity and latency to their data pipelines.
By providing native HTTP request capability within SQL, StarRocks can:
- Reduce architectural complexity by eliminating external integration layers
- Enable real-time responses to data events without leaving the SQL context
- Simplify webhook integrations for alerting and notification systems
- Support data enrichment workflows that require external API calls
Use Cases
1. Alerting & Notifications
Send Slack/Discord notifications when anomaly detection queries identify issues:
SELECT http_request(
url => 'https://hooks.slack.com/services/T00/B00/XXX',
method => 'POST',
headers => '{"Content-Type": "application/json"}',
body => CONCAT('{"text": "Alert: ', anomaly_description, '"}')
)
FROM anomaly_detection_results
WHERE severity = 'CRITICAL';
2. Data Enrichment
Call external APIs to augment query results with additional context:
SELECT
customer_id,
order_total,
JSON_QUERY(
http_request(
url => CONCAT('https://api.enrichment.com/customer/', customer_id)
),
'$.credit_score'
) AS credit_score
FROM orders;
3. Webhook Integration
Trigger external workflows based on data changes:
-- Trigger inventory replenishment workflow
SELECT http_request(
url => 'https://inventory.internal/api/reorder',
method => 'POST',
body => JSON_OBJECT('sku', sku, 'quantity', reorder_qty)
)
FROM inventory
WHERE current_stock < reorder_threshold;
Function Specification
Function Name
Decision: http_request
The name http_request was chosen over url for the following reasons:
- Clarity: Explicitly indicates the function performs HTTP requests
- Consistency: Aligns with naming conventions in other systems (e.g.,
http_get,http_post) - Future Compatibility: Avoids naming conflicts if
urlis needed for URL parsing/manipulation functions
Function Signature
http_request(
url VARCHAR,
method VARCHAR DEFAULT 'GET',
body VARCHAR DEFAULT '',
headers VARCHAR DEFAULT '{}',
timeout_ms INT DEFAULT 30000,
ssl_verify BOOLEAN DEFAULT true,
username VARCHAR DEFAULT '',
password VARCHAR DEFAULT ''
) -> VARCHAR
Note: This function uses Named Arguments support for scalar functions, implemented as part of this feature.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
VARCHAR | required | Target URL for the HTTP request |
method |
VARCHAR | 'GET' |
HTTP method: GET, POST, PUT, DELETE, PATCH |
body |
VARCHAR | '' |
Request body (for POST/PUT/PATCH) |
headers |
VARCHAR | '{}' |
JSON object containing HTTP headers |
timeout_ms |
INT | 30000 |
Request timeout in milliseconds |
ssl_verify |
BOOLEAN | true |
Enable/disable SSL certificate verification |
username |
VARCHAR | '' |
Username for HTTP Basic Authentication |
password |
VARCHAR | '' |
Password for HTTP Basic Authentication |
Return Value
Returns VARCHAR containing a JSON object with response information:
{
"status": 200,
"body": "{\"result\": \"success\"}"
}
status: HTTP response status codebody: Response body as string
Security Design
SSRF Risk Analysis
The http_request function introduces potential Server-Side Request Forgery (SSRF) risks:
- Internal Network Access: Attackers could probe internal services
- Port Scanning: Using the function to scan internal network ports
- Data Exfiltration: Sending sensitive data to attacker-controlled endpoints
Industry Comparison
| Database | Security Approach | Complexity |
|---|---|---|
| ClickHouse | remote_url_allow_hosts configuration |
Low |
| Snowflake | Network Rules + External Access Integration | High |
| Databricks | Network Policies with Allowed Domains (FQDN) + CONNECTION objects | High |
| DuckDB | enable_external_access variable |
Low |
| PostgreSQL (pgsql-http) | Function-level permission control | Medium |
Proposed Security Controls
The security implementation follows a phased approach balancing immediate protection with future extensibility:
┌─────────────────────────────────────────────────────────────────┐
│ Security Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1 (Current) Phase 2 (Future) │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Configuration- │ │ CONNECTION │ │
│ │ Based Controls │ ──► │ Objects │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ • Allowlist │ │ • Encrypted │ │
│ │ • Regex Match │ │ Credentials │ │
│ │ • Private Net │ │ • Reusable │ │
│ │ Blocking │ │ Endpoints │ │
│ │ • SSL Required │ │ • Single SQL │ │
│ └─────────────────┘ │ Management │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Phase 1: Configuration-Based Security
Security Parameters
| Configuration | Type | Default | Description |
|---|---|---|---|
http_request_host_allowlist |
STRING | "" |
Comma-separated list of allowed hosts |
http_request_host_allowlist_regexp |
STRING | "" |
Comma-separated list of allowed host regex patterns |
http_request_block_private_networks |
INT | 1~4 |
Block private IP ranges and localhost |
http_request_ssl_verification_required |
BOOL | true |
Enforce HTTPS with certificate validation |
Private Network Blocking
When http_request_block_private_networks = true, the following ranges are blocked:
10.0.0.0/8 (Class A private)
172.16.0.0/12 (Class B private)
192.168.0.0/16 (Class C private)
127.0.0.0/8 (Loopback)
169.254.0.0/16 (Link-local, includes cloud metadata)
::1/128 (IPv6 loopback)
fc00::/7 (IPv6 private)
fe80::/10 (IPv6 link-local)
Default Behavior (Secure by Default)
Critical: When both http_request_host_allowlist and http_request_host_allowlist_regexp are empty strings (default), no HTTP requests are allowed.
This ensures:
- Users must explicitly configure allowed endpoints
- Accidental exposure is prevented
- Security-first deployment model
Configuration Management
Configurations are managed as FE Dynamic Config via ADMIN SET FRONTEND CONFIG:
-- Enable specific hosts
ADMIN SET FRONTEND CONFIG (
"http_request_host_allowlist" = "api.slack.com,hooks.slack.com,api.example.com"
);
-- Enable hosts matching pattern
ADMIN SET FRONTEND CONFIG (
"http_request_host_allowlist_regexp" = ".*\\.internal\\.company\\.com"
);
-- Disable private network blocking (not recommended)
ADMIN SET FRONTEND CONFIG (
"http_request_block_private_networks" = "false"
);
-- Require SSL verification (default: true)
ADMIN SET FRONTEND CONFIG (
"http_request_ssl_verification_required" = "true"
);
Configuration Reference
Request Limits (Future Consideration)
| Configuration | Type | Default | Description |
|---|---|---|---|
http_request_max_response_size |
INT | 1048576 |
Maximum response size in bytes (1MB) |
http_request_default_timeout_ms |
INT | 30000 |
Default timeout in milliseconds |
Examples
Basic GET Request
-- Simple GET (requires host in allowlist)
SELECT http_request(url => 'https://api.example.com/status');
POST with JSON Body
SELECT http_request(
url => 'https://hooks.slack.com/services/XXX',
method => 'POST',
headers => '{"Content-Type": "application/json"}',
body => '{"text": "Database alert triggered"}'
);
Dynamic Webhook from Query Results
SELECT
metric_name,
metric_value,
http_request(
url => 'https://monitoring.internal/api/alert',
method => 'POST',
headers => '{"Content-Type": "application/json", "X-API-Key": "xxx"}',
body => JSON_OBJECT(
'metric', metric_name,
'value', metric_value,
'threshold', threshold,
'timestamp', NOW()
)
) AS alert_response
FROM metrics
WHERE metric_value > threshold;
Using with CONNECTION (Phase 2)
-- Create connection
CREATE CONNECTION pagerduty (
type = 'HTTP',
url = 'https://events.pagerduty.com/v2/enqueue',
method = 'POST',
headers = '{"Content-Type": "application/json"}',
password = 'integration_key_xxx'
);
-- Use connection
SELECT http_request(
connection => 'pagerduty',
body => JSON_OBJECT(
'routing_key', GET_CONNECTION_SECRET('pagerduty', 'password'),
'event_action', 'trigger',
'payload', JSON_OBJECT(
'summary', error_message,
'severity', 'critical',
'source', 'starrocks'
)
)
)
FROM error_logs
WHERE severity = 'CRITICAL' AND created_at > NOW() - INTERVAL 5 MINUTE;
Named Arguments for Scalar Functions
Overview
StarRocks now supports Named Arguments for scalar functions, enabling more readable and flexible function calls. This feature was implemented as a prerequisite for http_request() which has 8 parameters with defaults.
Syntax
-- Named Arguments syntax
function_name(param1 => value1, param2 => value2, ...)
-- Example
SELECT http_request(url => 'https://api.example.com', method => 'POST');
Features
| Feature | Description | Example |
|---|---|---|
| Named Parameters | Specify arguments by name | http_request(url => '...', method => 'GET') |
| Default Values | Omit optional parameters | http_request(url => '...') — uses default method='GET' |
| Positional Arguments | Traditional positional call still works | http_request('https://...') |
| Mixed Mode | Not supported (Named-only or Positional-only) | - |
User-Friendly Error Messages
| Scenario | Error Message |
|---|---|
| Missing required parameter | http_request() required parameter 'url' is missing |
| Unknown parameter (with hint) | http_request() unknown parameter 'URL'. Did you mean 'url'? |
| Duplicate parameter | http_request() duplicate parameter 'url' |
| NULL for required parameter | http_request() required parameter 'url' cannot be NULL |
| Empty arguments | http_request() requires at least 1 argument(s). Missing required parameter(s): 'url' |
Implementation Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Named Arguments Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Grammar (StarRocks.g4) │
│ └─ Parse `param => value` syntax │
│ │
│ 2. AST Builder (AstBuilder.java) │
│ └─ Create FunctionParams with named arguments │
│ │
│ 3. Function Registry (functions.py + gen_functions.py) │
│ └─ Define function metadata with named_args: │
│ {'name': 'url'}, │
│ {'name': 'method', 'default': 'GET'}, │
│ │
│ 4. Code Generation (VectorizedBuiltinFunctions.java) │
│ └─ setArgNames(), setDefaultNamedArgs() │
│ │
│ 5. Analyzer (ExpressionAnalyzer.java, FunctionAnalyzer.java) │
│ └─ Validate, reorder, and fill defaults │
│ │
└─────────────────────────────────────────────────────────────────┘
Defining a Named Arguments Function
In gensrc/script/functions.py:
[30470, 'http_request', True, False, 'VARCHAR',
['VARCHAR', 'VARCHAR', 'VARCHAR', 'VARCHAR', 'INT', 'BOOLEAN', 'VARCHAR', 'VARCHAR'],
'HttpRequestFunctions::http_request',
'HttpRequestFunctions::http_request_prepare', 'HttpRequestFunctions::http_request_close',
{
'named_args': [
{'name': 'url'}, # Required (no default)
{'name': 'method', 'default': 'GET'}, # Optional with default
{'name': 'body', 'default': ''},
{'name': 'headers', 'default': '{}'},
{'name': 'timeout_ms', 'default': 30000},
{'name': 'ssl_verify', 'default': True},
{'name': 'username', 'default': ''},
{'name': 'password', 'default': ''}
]
}],
Key Files Modified
| File | Purpose |
|---|---|
fe/fe-grammar/.../StarRocks.g4 |
Grammar rule for => syntax |
fe/fe-core/.../AstBuilder.java |
Parse named arguments to AST |
fe/fe-core/.../FunctionParams.java |
Store and reorder named arguments |
fe/fe-core/.../Function.java |
Store arg names and defaults |
fe/fe-core/.../FunctionAnalyzer.java |
Validate named arguments |
fe/fe-core/.../ExpressionAnalyzer.java |
Resolve function with defaults |
gensrc/script/functions.py |
Function definitions with named_args |
gensrc/script/gen_functions.py |
Generate Java registration code |
Phase 2: CONNECTION Object (Summary)
Status: Future Enhancement
Phase 2 introduces CONNECTION objects for centralized credential management and reusable endpoint definitions.
Quick Reference
-- Create a reusable connection
CREATE CONNECTION slack_webhook (
type = 'HTTP',
url = 'https://hooks.slack.com/services/XXX',
method = 'POST',
headers = '{"Content-Type": "application/json"}'
);
-- Use the connection
SELECT http_request(connection => 'slack_webhook', body => '{"text": "Hello"}');
Key Benefits
- Centralized Credentials: Passwords stored securely with encryption
- Reusability: Define once, use across multiple queries
- Audit Trail: Track connection usage and modifications
Backward Compatibility
Compatibility Strategy
The implementation supports Option B: Support Both approaches.
Migration Path
Phase 1 Phase 2 Phase 3
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ URL-based │ │ Both URL & │ │ CONNECTION- │
│ with Config │ ──► │ CONNECTION │ ──► │ preferred │
│ Controls │ │ supported │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│ │ │
▼ ▼ ▼
Allowlist Add CONNECTION Deprecation
validation object support warnings for
only direct URL
References
External Documentation
| System | Documentation |
|---|---|
| ClickHouse | url() Table Function |
| PostgreSQL | pgsql-http Extension |
| Snowflake | External Functions |
| Databricks | CREATE CONNECTION |
Appendix: Discussion Summary
Key Discussion Points with @alvin-celerdata
-
Function Naming: Agreed on
http_requestoverurlfor clarity -
Named Arguments: Identified need to enhance non-table functions to support named arguments
-
SSRF Protection:
- Initial proposal: URL allowlist + private network blocking
- Alvin's suggestion: Consider CONNECTION objects for stricter control
- Resolution: Implement both approaches (Option B)
-
Configuration Management:
- Initial: Static fe.conf
- Improved: FE Dynamic Config via
ADMIN SET FRONTEND CONFIG - Future: Single SQL statement via CONNECTION objects
-
Backward Compatibility: Agreed on supporting both URL argument and CONNECTION argument with configuration toggle
Last Updated: Based on GitHub Issue #66227 discussion
[!NOTE] Introduces http_request() with named parameters, SSRF safeguards (allowlists, security levels, DNS pinning), and adds named-argument support across parser/analyzer, configs, networking utils, HttpClient, and tests.
- Functionality:
- New scalar function
http_request()(nondeterministic) with named parameters, JSON responses, size limit, timeout, SSL options, headers/body/auth.- Security/SSRF: security levels, host/IP allowlists (regex/IP), private/link-local IP blocking, admin-enforced SSL verification, and DNS pinning.
- Networking/HTTP:
HttpClient: disable redirects, addset_resolve_host()(CURLOPT_RESOLVE) andset_fail_on_error(), cleanup resolve list.network_util: addis_private_ip,is_link_local_ip,resolve_hostname_all_ips,extract_host_from_url,extract_port_from_url.- Config/Runtime/Thrift:
- FE
ConfigandSessionVariableexposehttp_request_*settings; ThriftTQueryOptionsfields; BERuntimeStategetters.- Parser/Analyzer/Registry:
- Add named-arguments support for scalar functions: grammar (
param => value), AST handling, validation (missing/duplicate/unknown/NULL), reordering and default filling.- Function registry/codegen updated to register named args/defaults for
http_request.- BE Integration:
- Register function (FID 30470), mark as returning random in
function_call_expr, include in build.- Tests/Docs:
- Extensive unit and SQL tests for http_request, security, network utils, HttpClient, and named-arg handling.
- Add documentation for HTTP URL function usage.
Written by Cursor Bugbot for commit b1a2faf873385f3a5aa1497c9eb376fcdc85afb9. This will update automatically on new commits. Configure here.
What type of PR is this:
- [ ] BugFix
- [x] Feature
- [ ] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [x] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [x] This pr needs user documentation (for new or modified features or behaviors)
- [x] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [ ] I have checked the version labels which the pr will be auto-backported to the target branch
- [x] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
What about the security considerations? Should we implement limitations on the domains?
@copilot how do you think about it
What about the security considerations? Should we implement limitations on the domains?
@copilot how do you think about it
Hi @murphyatwork
Thank you very much for your great feedback.
In my view, I also considered this approach, but concluded that relying on infrastructure-level defenses (Security Groups, Network Firewall, etc.) from a StarRocks perspective would reduce management complexity.
For now, I plan to add a variable that allows ADMIN accounts to enforce SSL certificate validation, preventing other users from calling URLs without SSL certificates.
For reference, when considering the cases of other databases: PostgreSQL and DuckDB approach:
PostgreSQL provides this only as function-level controls DuckDB allows control through configuration variables + external access denied variables
Enterprise-grade databases (Databricks, Snowflake, ClickHouse):
These maintain their own Network Rule Lists internally This appears to be because SaaS providers have control over both the application AND the infrastructure, allowing them to leverage this unified control
Thanks!
I just fix almost case generated from copilot.
gensrc/thrift/InternalService.thrift:
- Added url_ssl_verification_required option (field 201) to TQueryOptions
fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java:
- Added Config.url_ssl_verification_required import
- Added tResult.setUrl_ssl_verification_required() in toThrift() method
- be/src/runtime/runtime_state.h:
- Added url_ssl_verification_required() accessor method
be/src/exprs/url_functions.cpp:
- Added #include "runtime/runtime_state.h"
- Fixed simdjson error checks: !obj[...].get() → obj[...].get() == simdjson::SUCCESS
- Added HttpClient reuse across rows for better performance
- Added constant config caching with std::optional<UrlConfig>
- Moved ColumnViewer creation outside loop
- Added DELETE method body support
- Changed url_prepare() to read SSL config from RuntimeState
- Changed invalid config to return JSON error instead of NULL
be/src/exprs/url_functions.cpp:
- Added is_valid_utf8() - Validates UTF-8 byte sequences, returns false for invalid encoding
- Added is_valid_json() - Uses simdjson for proper JSON validation instead of simple string check
- Modified build_json_response() - Returns JSON error response when body contains invalid UTF-8
be/test/exprs/url_functions_test.cpp:
- Fixed prepareCloseTest - Only checks ssl_verify_required field (removed deprecated fields)
docs/en/sql-reference/sql-functions/scalar-functions/url.md:
- Updated documentation with JSON config parameters and examples
@cursor review
Thanks
- Fix response size check to use streaming callback (using streaming it will return "Response size exceeds limit (1048576 bytes). Received: 5131466 bytes")
- Fix timeout_ms integer overflow with bounds checking ( larger then 300,000 it'll be 300,000)
- update url.md file
@EdwardArchive Thanks for the contribution, after roughly investigating other systems' implementation, I have some suggestions. 2. please create a GitHub Issue for this new function, and in that place we need to get aligned on the interface of this function. 3. Because this function will access an external HTTP link, it may introduce potential risks to the system. Maybe we need to introduce some concepts like connection in Databricks. 4. please explain the background for this function in the issue ticket.
@EdwardArchive Thanks for the contribution, after roughly investigating other systems' implementation, I have some suggestions. 2. please create a GitHub Issue for this new function, and in that place we need to get aligned on the interface of this function. 3. Because this function will access an external HTTP link, it may introduce potential risks to the system. Maybe we need to introduce some concepts like connection in Databricks. 4. please explain the background for this function in the issue ticket.
Thanks! I just make issue on here https://github.com/StarRocks/starrocks/issues/66227
Quality Gate passed
Issues
2 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@cursor review
@cursor review
this will changed so delete old request
Skipping Bugbot: Unable to authenticate your request. Please make sure Bugbot is properly installed and configured for this repository.
@cursor review
Named Arguments Support for Scalar Functions
Summary
This PR implements Named Arguments support for scalar functions in StarRocks, using http_request() as the first function to leverage this feature.
Features
- Named Arguments syntax:
function(param => value, ...) - Default values: Optional parameters can be omitted
- Positional calls: Traditional positional syntax still works
- User-friendly error messages: Clear hints for common mistakes
Usage Examples
-- Named Arguments (any order, omit optional params)
SELECT http_request(url => 'https://api.example.com');
SELECT http_request(url => 'https://api.example.com', method => 'POST', body => '{}');
SELECT http_request(method => 'POST', url => 'https://api.example.com'); -- order doesn't matter
-- Positional Arguments (still supported)
SELECT http_request('https://api.example.com');
Error Messages
| Scenario | Error Message |
|---|---|
| Missing required param | http_request() required parameter 'url' is missing |
| Unknown param (with hint) | http_request() unknown parameter 'URL'. Did you mean 'url'? |
| Duplicate param | http_request() duplicate parameter 'url' |
| NULL for required param | http_request() required parameter 'url' cannot be NULL |
| No arguments | http_request() requires at least 1 argument(s). Missing required parameter(s): 'url' |
Implementation
How to Define a Named Arguments Function
In gensrc/script/functions.py:
[30470, 'http_request', True, False, 'VARCHAR',
['VARCHAR', 'VARCHAR', 'VARCHAR', 'VARCHAR', 'INT', 'BOOLEAN', 'VARCHAR', 'VARCHAR'],
'HttpRequestFunctions::http_request',
'HttpRequestFunctions::http_request_prepare', 'HttpRequestFunctions::http_request_close',
{
'named_args': [
{'name': 'url'}, # Required (no default)
{'name': 'method', 'default': 'GET'}, # Optional with default
{'name': 'body', 'default': ''},
{'name': 'headers', 'default': '{}'},
{'name': 'timeout_ms', 'default': 30000},
{'name': 'ssl_verify', 'default': True},
{'name': 'username', 'default': ''},
{'name': 'password', 'default': ''}
]
}],
Then run code generation:
cd gensrc/script && python3 gen_functions.py --java /path/to/fe/target/generated-sources/build
Files Changed
| File | Changes |
|---|---|
gensrc/script/functions.py |
Added named_args metadata for http_request |
gensrc/script/gen_functions.py |
Added template for Named Arguments code generation |
fe/.../FunctionParams.java |
Added reorderNamedArgAndAppendDefaults(), appendDefaultsForPositionalArgs() |
fe/.../FunctionAnalyzer.java |
Added Named Arguments validation with user-friendly errors |
fe/.../ExpressionAnalyzer.java |
Added branching logic for Named vs Positional arguments |
Architecture Flow
SQL: http_request(url => '...', method => 'POST')
│
▼
┌─────────────────────────────────────┐
│ 1. Parser (StarRocks.g4) │
│ Parse `param => value` syntax │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 2. AstBuilder.java │
│ Create NamedArgument AST nodes │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 3. FunctionParams.java │
│ Separate exprs[] + exprsNames[] │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 4. ExpressionAnalyzer.java │
│ Branch: Named vs Positional │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 5. FunctionAnalyzer.java │
│ Validate & lookup function │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 6. FunctionParams.java │
│ Reorder args & append defaults │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 7. Backend (C++) │
│ Receive all 8 columns │
└─────────────────────────────────────┘
Backward Compatibility
- No impact on existing functions: Only functions with
named_argsdefined use this path - Positional calls still work:
http_request('url')works as expected - Varargs functions excluded: Functions like
concat()are not affected
Constraints
- No mixed mode: All arguments must be either named or positional (not both)
- Case-sensitive: Parameter names are case-sensitive (
url≠URL) - Varargs not supported: Functions with variable arguments cannot use Named Arguments
🧪 CI Insights
Here's what we observed from your CI run for 8cc37615.
🟢 All jobs passed!
But CI Insights is watching 👀
@EdwardArchive
This PR implements Named Arguments support for scalar functions in StarRocks, using http_request() as the first function to leverage this feature.
This is great, but I would like you to separate it into two PRs. Please implement Named Arguments in a separate PR.
@alvin-celerdata Thanks, then I'll try to make separate it into two PRs.
@cursor review
There are changes and more detail on SSRF feature
SSRF Protection Implementation Summary
Overview
This PR implements comprehensive Server-Side Request Forgery (SSRF) protection for the http_request() function with a 4-level security system and defense-in-depth architecture.
Security Levels
| Level | Mode | Behavior |
|---|---|---|
| 1 | TRUSTED | Allow all requests (development only) |
| 2 | PUBLIC | Block private IPs, allow public hosts |
| 3 | RESTRICTED | Default - Require allowlist for all hosts |
| 4 | PARANOID | Block all requests unconditionally |
Default: Level 3 (RESTRICTED) - Secure by default, requires explicit allowlist configuration.
Configuration Parameters
All parameters are runtime-mutable via ADMIN SET FRONTEND CONFIG:
-- Security level (1=TRUSTED, 2=PUBLIC, 3=RESTRICTED, 4=PARANOID)
ADMIN SET FRONTEND CONFIG ("http_request_security_level" = "3");
-- IP allowlist (exact match on IPv4)
ADMIN SET FRONTEND CONFIG ("http_request_ip_allowlist" = "192.168.1.1,10.0.0.1");
-- Host regex patterns (full string match)
ADMIN SET FRONTEND CONFIG ("http_request_host_allowlist_regexp" = "api\\.slack\\.com,.*\\.example\\.com");
-- Admin-enforced SSL verification (prevents user bypass)
ADMIN SET FRONTEND CONFIG ("http_request_ssl_verification_required" = "true");
-- Allow private IPs if in allowlist (NOT recommended for production)
ADMIN SET FRONTEND CONFIG ("http_request_allow_private_in_allowlist" = "false");
IP Blocking Implementation
IPv4 Private Ranges (6 ranges)
127.0.0.0/8- Loopback10.0.0.0/8- Class A Private172.16.0.0/12- Class B Private192.168.0.0/16- Class C Private169.254.0.0/16- Link-local (Cloud Metadata)0.0.0.0/8- Current network
IPv6 Private Ranges (4 ranges)
::1/128- IPv6 loopbackfc00::/7- Unique local addressesfe80::/10- Link-local::ffff:0:0/96- IPv4-mapped addresses
Special: Cloud Metadata Protection
Dedicated detection for 169.254.169.254 (AWS/GCP/Azure metadata endpoints) with enhanced warning:
WARNING: Allowing this IP can expose cloud credentials and sensitive metadata.
Defense-in-Depth Architecture
8 Security Layers:
- Protocol Validation - Only
http://andhttps://allowed - Security Level Check - TRUSTED/PUBLIC/RESTRICTED/PARANOID
- URL Parsing - Extract hostname, handle IPv6 brackets
- DNS Resolution - Resolve ALL IPs (IPv4 + IPv6), not just first
- Private IP Detection - Check all resolved IPs against private ranges
- Allowlist Validation - IP exact match OR regex pattern match
- SSL/TLS Enforcement - Admin can enforce SSL, preventing user bypass
- Response Size Limiting - Max 1 MB response
Key Implementation Details
DNS Resolution Security
// Resolves hostname to ALL IPs (both IPv4 and IPv6)
// Then validates EVERY resolved IP against security rules
Status resolve_hostname_all_ips(const std::string& hostname,
std::vector<std::string>& ips);
Why: Prevents DNS rebinding attacks where attacker controls DNS to return private IPs after initial public IP validation.
SSL Verification Enforcement
// Two-level control:
// 1. User parameter: ssl_verify=>true/false
// 2. Admin enforcement: http_request_ssl_verification_required
if (admin_requires_ssl && user_requests_no_verify) {
return ERROR("SSL verification is enforced by administrator");
}
Allowlist Logic
// IP allowlist (exact string match) OR host regex match
bool check_allowlist(host, resolved_ips) {
for (auto& ip : resolved_ips) {
if (ip_allowlist.contains(ip)) return true; // IP match
}
return regex_match(host, host_patterns); // Regex match
}
Production Configuration Example
Recommended Secure Defaults:
-- 1. Use RESTRICTED mode (require allowlist)
ADMIN SET FRONTEND CONFIG ("http_request_security_level" = "3");
-- 2. Enforce SSL verification globally
ADMIN SET FRONTEND CONFIG ("http_request_ssl_verification_required" = "true");
-- 3. Configure allowed public APIs only
ADMIN SET FRONTEND CONFIG ("http_request_host_allowlist_regexp" =
"api\\.slack\\.com,hooks\\.slack\\.com,api\\.github\\.com");
-- 4. Keep private IP allowlist disabled (default)
-- "http_request_allow_private_in_allowlist" = "false"
Test Coverage
Comprehensive test suite with 40+ test scenarios in test/sql/test_http_request_function/T/test_http_request_security.sql:
Test Categories:
- All IPv4 private IP ranges (127.x, 10.x, 172.16-31.x, 192.168.x, 169.254.x, 0.x)
- All IPv6 private IP ranges (::1, fc00::/7, fe80::/10, IPv4-mapped)
- Cloud metadata detection (169.254.169.254)
- Security level transitions (1→2→3→4)
- Allowlist matching (IP exact + regex patterns)
- SSL verification enforcement
- DNS resolution security (all IPs checked)
- SSRF bypass attempts (decimal IPs, octal notation, etc.)
Industry Comparison
| Feature | StarRocks | ClickHouse | Snowflake | Databricks |
|---|---|---|---|---|
| Default Security | RESTRICTED | RESTRICTED | NETWORK RULES | NETWORK POLICIES |
| Allowlist Support | IP + Regex | Hosts only | FQDN rules | FQDN + IP |
| Private IP Blocking | Default ON | Config-based | Always | Default |
| SSL Enforcement | Admin-enforced | User-controlled | Always | Always |
| DNS Resolution | All IPs | First IP only | All IPs | All IPs |
| Cloud Metadata Protection | Special detection | Blocked | Blocked | Blocked |
StarRocks Advantages:
- 4-level security system (vs. binary ON/OFF)
- Regex pattern matching (more flexible than exact domains)
- Admin SSL enforcement (prevents user bypass)
- Comprehensive IPv6 support with all private ranges
- Special cloud metadata detection with enhanced warnings
Code Locations
| Component | File | Lines |
|---|---|---|
| FE Config Parameters | fe/fe-core/src/main/java/com/starrocks/common/Config.java |
4044-4071 |
| Security Levels Enum | be/src/exprs/http_request_functions.h |
30-35 |
| Main Validation Logic | be/src/exprs/http_request_functions.cpp |
344-457 |
| IPv4 Private IP Check | be/src/util/network_util.cpp |
198-228 |
| IPv6 Private IP Check | be/src/util/network_util.cpp |
230-259 |
| DNS Resolution | be/src/util/network_util.cpp |
291-333 |
| URL Parsing | be/src/util/network_util.cpp |
335-372 |
| SSL Handling | be/src/http/http_client.h |
93-96 |
Security Considerations
Why Default is RESTRICTED (Level 3)?
Secure by Default Principle:
- No requests allowed without explicit configuration
- Forces administrators to whitelist endpoints
- Prevents accidental SSRF exposure
- Follows industry best practices
Why Block Private IPs by Default?
SSRF Attack Vectors:
- Internal service enumeration (scan internal APIs)
- Cloud metadata access (steal IAM credentials)
- Localhost bypass (access local services)
- Data exfiltration (send to internal logging)
Why Special Link-Local Detection?
Cloud-Specific Risk:
- AWS:
http://169.254.169.254/latest/meta-data/ - GCP:
http://metadata.google.internal/ - Azure:
http://169.254.169.254/metadata/instance
Exposure: IAM credentials, API keys, instance metadata
@alvin-celerdata Hi, Is there any though about SSRF feature?
@cursor review
@cursor review
@cursor review
[FE Incremental Coverage Report]
:x: fail : 61 / 88 (69.32%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/common/ConfigBase.java | 16 | 40 | 40.00% | [256, 257, 258, 260, 261, 263, 264, 265, 267, 268, 272, 277, 278, 279, 281, 282, 284, 285, 286, 288, 289, 294, 306, 307] |
| :large_blue_circle: | com/starrocks/sql/analyzer/ExpressionAnalyzer.java | 41 | 44 | 93.18% | [1107, 1108, 1110] |
| :large_blue_circle: | com/starrocks/catalog/FunctionSet.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/analyzer/FunctionAnalyzer.java | 3 | 3 | 100.00% | [] |
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[BE Incremental Coverage Report]
:x: fail : 9 / 27 (33.33%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | src/exprs/function_call_expr.cpp | 0 | 2 | 00.00% | [124, 125] |
| :large_blue_circle: | src/util/network_util.cpp | 0 | 9 | 00.00% | [198, 201, 202, 203, 205, 206, 209, 210, 212] |
| :large_blue_circle: | src/http/http_client.h | 0 | 4 | 00.00% | [99, 100, 101, 102] |
| :large_blue_circle: | src/runtime/runtime_state.h | 2 | 5 | 40.00% | [381, 390, 391] |
| :large_blue_circle: | src/http/http_client.cpp | 7 | 7 | 100.00% | [] |
@cursor review
@cursor review
@cursor review