Issue: #66227
Related PR: #66206
Status: Developing on SSRF (Named Parameter done)
Authors: @EdwardArchive, with review from @alvin-celerdata

What I'm doing:

Executive Summary

This PR adds an http_request() scalar function to StarRocks that enables executing HTTP/HTTPS requests directly from SQL queries. The function allows users to send webhook notifications, integrate with external REST APIs for data enrichment, and enable real-time event-driven workflows from within SQL.

Key Design Decisions:

Decision	Outcome
Function Name	`http_request` (clearer than `url`)
Parameter Style	Named arguments (requires enhancement for non-table functions)
Security Model	Allowlist-based with private network blocking by default
Future Enhancement	CONNECTION object for credential management and reusability

Background & Motivation

Modern data platforms increasingly require integration with external services for alerting, enrichment, and workflow automation. Currently, StarRocks users must implement these integrations outside the database layer, adding complexity and latency to their data pipelines.

By providing native HTTP request capability within SQL, StarRocks can:

Reduce architectural complexity by eliminating external integration layers
Enable real-time responses to data events without leaving the SQL context
Simplify webhook integrations for alerting and notification systems
Support data enrichment workflows that require external API calls

Use Cases

1. Alerting & Notifications

Send Slack/Discord notifications when anomaly detection queries identify issues:

SELECT http_request(
    url => 'https://hooks.slack.com/services/T00/B00/XXX',
    method => 'POST',
    headers => '{"Content-Type": "application/json"}',
    body => CONCAT('{"text": "Alert: ', anomaly_description, '"}')
)
FROM anomaly_detection_results
WHERE severity = 'CRITICAL';

2. Data Enrichment

Call external APIs to augment query results with additional context:

SELECT 
    customer_id,
    order_total,
    JSON_QUERY(
        http_request(
            url => CONCAT('https://api.enrichment.com/customer/', customer_id)
        ),
        '$.credit_score'
    ) AS credit_score
FROM orders;

3. Webhook Integration

Trigger external workflows based on data changes:

-- Trigger inventory replenishment workflow
SELECT http_request(
    url => 'https://inventory.internal/api/reorder',
    method => 'POST',
    body => JSON_OBJECT('sku', sku, 'quantity', reorder_qty)
)
FROM inventory
WHERE current_stock < reorder_threshold;

Function Specification

Function Name

Decision: http_request

The name http_request was chosen over url for the following reasons:

Clarity: Explicitly indicates the function performs HTTP requests
Consistency: Aligns with naming conventions in other systems (e.g., http_get, http_post)
Future Compatibility: Avoids naming conflicts if url is needed for URL parsing/manipulation functions

Function Signature

http_request(
    url VARCHAR,
    method VARCHAR DEFAULT 'GET',
    body VARCHAR DEFAULT '',
    headers VARCHAR DEFAULT '{}',
    timeout_ms INT DEFAULT 30000,
    ssl_verify BOOLEAN DEFAULT true,
    username VARCHAR DEFAULT '',
    password VARCHAR DEFAULT ''
) -> VARCHAR

Note: This function uses Named Arguments support for scalar functions, implemented as part of this feature.

Parameters

Parameter	Type	Default	Description
`url`	VARCHAR	required	Target URL for the HTTP request
`method`	VARCHAR	`'GET'`	HTTP method: `GET`, `POST`, `PUT`, `DELETE`, `PATCH`
`body`	VARCHAR	`''`	Request body (for POST/PUT/PATCH)
`headers`	VARCHAR	`'{}'`	JSON object containing HTTP headers
`timeout_ms`	INT	`30000`	Request timeout in milliseconds
`ssl_verify`	BOOLEAN	`true`	Enable/disable SSL certificate verification
`username`	VARCHAR	`''`	Username for HTTP Basic Authentication
`password`	VARCHAR	`''`	Password for HTTP Basic Authentication

Return Value

Returns VARCHAR containing a JSON object with response information:

{
    "status": 200,
    "body": "{\"result\": \"success\"}"
}

status: HTTP response status code
body: Response body as string

Security Design

SSRF Risk Analysis

The http_request function introduces potential Server-Side Request Forgery (SSRF) risks:

Internal Network Access: Attackers could probe internal services
Port Scanning: Using the function to scan internal network ports
Data Exfiltration: Sending sensitive data to attacker-controlled endpoints

Industry Comparison

Database	Security Approach	Complexity
ClickHouse	`remote_url_allow_hosts` configuration	Low
Snowflake	Network Rules + External Access Integration	High
Databricks	Network Policies with Allowed Domains (FQDN) + CONNECTION objects	High
DuckDB	`enable_external_access` variable	Low
PostgreSQL (pgsql-http)	Function-level permission control	Medium

Proposed Security Controls

The security implementation follows a phased approach balancing immediate protection with future extensibility:

┌─────────────────────────────────────────────────────────────────┐
│                     Security Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1 (Current)              Phase 2 (Future)                │
│  ┌─────────────────┐           ┌─────────────────┐              │
│  │ Configuration-  │           │   CONNECTION    │              │
│  │ Based Controls  │    ──►    │     Objects     │              │
│  └────────┬────────┘           └────────┬────────┘              │
│           │                             │                        │
│           ▼                             ▼                        │
│  ┌─────────────────┐           ┌─────────────────┐              │
│  │ • Allowlist     │           │ • Encrypted     │              │
│  │ • Regex Match   │           │   Credentials   │              │
│  │ • Private Net   │           │ • Reusable      │              │
│  │   Blocking      │           │   Endpoints     │              │
│  │ • SSL Required  │           │ • Single SQL    │              │
│  └─────────────────┘           │   Management    │              │
│                                └─────────────────┘              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase 1: Configuration-Based Security

Security Parameters

Configuration	Type	Default	Description
`http_request_host_allowlist`	STRING	`""`	Comma-separated list of allowed hosts
`http_request_host_allowlist_regexp`	STRING	`""`	Comma-separated list of allowed host regex patterns
`http_request_block_private_networks`	INT	`1~4`	Block private IP ranges and localhost
`http_request_ssl_verification_required`	BOOL	`true`	Enforce HTTPS with certificate validation

Private Network Blocking

When http_request_block_private_networks = true, the following ranges are blocked:

10.0.0.0/8        (Class A private)
172.16.0.0/12     (Class B private)
192.168.0.0/16    (Class C private)
127.0.0.0/8       (Loopback)
169.254.0.0/16    (Link-local, includes cloud metadata)
::1/128           (IPv6 loopback)
fc00::/7          (IPv6 private)
fe80::/10         (IPv6 link-local)

Default Behavior (Secure by Default)

Critical: When both http_request_host_allowlist and http_request_host_allowlist_regexp are empty strings (default), no HTTP requests are allowed.

This ensures:

Users must explicitly configure allowed endpoints
Accidental exposure is prevented
Security-first deployment model

Configuration Management

Configurations are managed as FE Dynamic Config via ADMIN SET FRONTEND CONFIG:

-- Enable specific hosts
ADMIN SET FRONTEND CONFIG (
    "http_request_host_allowlist" = "api.slack.com,hooks.slack.com,api.example.com"
);

-- Enable hosts matching pattern
ADMIN SET FRONTEND CONFIG (
    "http_request_host_allowlist_regexp" = ".*\\.internal\\.company\\.com"
);

-- Disable private network blocking (not recommended)
ADMIN SET FRONTEND CONFIG (
    "http_request_block_private_networks" = "false"
);

-- Require SSL verification (default: true)
ADMIN SET FRONTEND CONFIG (
    "http_request_ssl_verification_required" = "true"
);

Configuration Reference

Request Limits (Future Consideration)

Configuration	Type	Default	Description
`http_request_max_response_size`	INT	`1048576`	Maximum response size in bytes (1MB)
`http_request_default_timeout_ms`	INT	`30000`	Default timeout in milliseconds

Examples

Basic GET Request

-- Simple GET (requires host in allowlist)
SELECT http_request(url => 'https://api.example.com/status');

POST with JSON Body

SELECT http_request(
    url => 'https://hooks.slack.com/services/XXX',
    method => 'POST',
    headers => '{"Content-Type": "application/json"}',
    body => '{"text": "Database alert triggered"}'
);

Dynamic Webhook from Query Results

SELECT 
    metric_name,
    metric_value,
    http_request(
        url => 'https://monitoring.internal/api/alert',
        method => 'POST',
        headers => '{"Content-Type": "application/json", "X-API-Key": "xxx"}',
        body => JSON_OBJECT(
            'metric', metric_name,
            'value', metric_value,
            'threshold', threshold,
            'timestamp', NOW()
        )
    ) AS alert_response
FROM metrics
WHERE metric_value > threshold;

Using with CONNECTION (Phase 2)

-- Create connection
CREATE CONNECTION pagerduty (
    type = 'HTTP',
    url = 'https://events.pagerduty.com/v2/enqueue',
    method = 'POST',
    headers = '{"Content-Type": "application/json"}',
    password = 'integration_key_xxx'
);

-- Use connection
SELECT http_request(
    connection => 'pagerduty',
    body => JSON_OBJECT(
        'routing_key', GET_CONNECTION_SECRET('pagerduty', 'password'),
        'event_action', 'trigger',
        'payload', JSON_OBJECT(
            'summary', error_message,
            'severity', 'critical',
            'source', 'starrocks'
        )
    )
)
FROM error_logs
WHERE severity = 'CRITICAL' AND created_at > NOW() - INTERVAL 5 MINUTE;

Named Arguments for Scalar Functions

Overview

StarRocks now supports Named Arguments for scalar functions, enabling more readable and flexible function calls. This feature was implemented as a prerequisite for http_request() which has 8 parameters with defaults.

Syntax

-- Named Arguments syntax
function_name(param1 => value1, param2 => value2, ...)

-- Example
SELECT http_request(url => 'https://api.example.com', method => 'POST');

Features

Feature	Description	Example
Named Parameters	Specify arguments by name	`http_request(url => '...', method => 'GET')`
Default Values	Omit optional parameters	`http_request(url => '...')` — uses default `method='GET'`
Positional Arguments	Traditional positional call still works	`http_request('https://...')`
Mixed Mode	Not supported (Named-only or Positional-only)	-

User-Friendly Error Messages

Scenario	Error Message
Missing required parameter	`http_request() required parameter 'url' is missing`
Unknown parameter (with hint)	`http_request() unknown parameter 'URL'. Did you mean 'url'?`
Duplicate parameter	`http_request() duplicate parameter 'url'`
NULL for required parameter	`http_request() required parameter 'url' cannot be NULL`
Empty arguments	`http_request() requires at least 1 argument(s). Missing required parameter(s): 'url'`

Implementation Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   Named Arguments Flow                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Grammar (StarRocks.g4)                                      │
│     └─ Parse `param => value` syntax                            │
│                                                                  │
│  2. AST Builder (AstBuilder.java)                               │
│     └─ Create FunctionParams with named arguments               │
│                                                                  │
│  3. Function Registry (functions.py + gen_functions.py)         │
│     └─ Define function metadata with named_args:                │
│        {'name': 'url'},                                         │
│        {'name': 'method', 'default': 'GET'},                    │
│                                                                  │
│  4. Code Generation (VectorizedBuiltinFunctions.java)           │
│     └─ setArgNames(), setDefaultNamedArgs()                     │
│                                                                  │
│  5. Analyzer (ExpressionAnalyzer.java, FunctionAnalyzer.java)   │
│     └─ Validate, reorder, and fill defaults                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Defining a Named Arguments Function

In gensrc/script/functions.py:

[30470, 'http_request', True, False, 'VARCHAR',
 ['VARCHAR', 'VARCHAR', 'VARCHAR', 'VARCHAR', 'INT', 'BOOLEAN', 'VARCHAR', 'VARCHAR'],
 'HttpRequestFunctions::http_request',
 'HttpRequestFunctions::http_request_prepare', 'HttpRequestFunctions::http_request_close',
 {
     'named_args': [
         {'name': 'url'},                          # Required (no default)
         {'name': 'method', 'default': 'GET'},     # Optional with default
         {'name': 'body', 'default': ''},
         {'name': 'headers', 'default': '{}'},
         {'name': 'timeout_ms', 'default': 30000},
         {'name': 'ssl_verify', 'default': True},
         {'name': 'username', 'default': ''},
         {'name': 'password', 'default': ''}
     ]
 }],

Key Files Modified

File	Purpose
`fe/fe-grammar/.../StarRocks.g4`	Grammar rule for `=>` syntax
`fe/fe-core/.../AstBuilder.java`	Parse named arguments to AST
`fe/fe-core/.../FunctionParams.java`	Store and reorder named arguments
`fe/fe-core/.../Function.java`	Store arg names and defaults
`fe/fe-core/.../FunctionAnalyzer.java`	Validate named arguments
`fe/fe-core/.../ExpressionAnalyzer.java`	Resolve function with defaults
`gensrc/script/functions.py`	Function definitions with named_args
`gensrc/script/gen_functions.py`	Generate Java registration code

Phase 2: CONNECTION Object (Summary)

Status: Future Enhancement

Phase 2 introduces CONNECTION objects for centralized credential management and reusable endpoint definitions.

Quick Reference

-- Create a reusable connection
CREATE CONNECTION slack_webhook (
    type = 'HTTP',
    url = 'https://hooks.slack.com/services/XXX',
    method = 'POST',
    headers = '{"Content-Type": "application/json"}'
);

-- Use the connection
SELECT http_request(connection => 'slack_webhook', body => '{"text": "Hello"}');

Key Benefits

Centralized Credentials: Passwords stored securely with encryption
Reusability: Define once, use across multiple queries
Audit Trail: Track connection usage and modifications

Backward Compatibility

Compatibility Strategy

The implementation supports Option B: Support Both approaches.

Migration Path

Phase 1                    Phase 2                    Phase 3
┌─────────────┐           ┌─────────────┐           ┌─────────────┐
│ URL-based   │           │ Both URL &  │           │ CONNECTION- │
│ with Config │    ──►    │ CONNECTION  │    ──►    │ preferred   │
│ Controls    │           │ supported   │           │             │
└─────────────┘           └─────────────┘           └─────────────┘
     │                          │                         │
     │                          │                         │
     ▼                          ▼                         ▼
  Allowlist               Add CONNECTION            Deprecation
  validation              object support            warnings for
  only                                              direct URL

References

External Documentation

System	Documentation
ClickHouse	url() Table Function
PostgreSQL	pgsql-http Extension
Snowflake	External Functions
Databricks	CREATE CONNECTION

Appendix: Discussion Summary

Key Discussion Points with @alvin-celerdata

Function Naming: Agreed on http_request over url for clarity
Named Arguments: Identified need to enhance non-table functions to support named arguments
SSRF Protection:
- Initial proposal: URL allowlist + private network blocking
- Alvin's suggestion: Consider CONNECTION objects for stricter control
- Resolution: Implement both approaches (Option B)
Configuration Management:
- Initial: Static fe.conf
- Improved: FE Dynamic Config via ADMIN SET FRONTEND CONFIG
- Future: Single SQL statement via CONNECTION objects
Backward Compatibility: Agreed on supporting both URL argument and CONNECTION argument with configuration toggle

Last Updated: Based on GitHub Issue #66227 discussion

[!NOTE] Introduces http_request() with named parameters, SSRF safeguards (allowlists, security levels, DNS pinning), and adds named-argument support across parser/analyzer, configs, networking utils, HttpClient, and tests.

Functionality:

New scalar function http_request() (nondeterministic) with named parameters, JSON responses, size limit, timeout, SSL options, headers/body/auth.

Security/SSRF: security levels, host/IP allowlists (regex/IP), private/link-local IP blocking, admin-enforced SSL verification, and DNS pinning.

Networking/HTTP:

HttpClient: disable redirects, add set_resolve_host() (CURLOPT_RESOLVE) and set_fail_on_error(), cleanup resolve list.

network_util: add is_private_ip, is_link_local_ip, resolve_hostname_all_ips, extract_host_from_url, extract_port_from_url.

Config/Runtime/Thrift:

FE Config and SessionVariable expose http_request_* settings; Thrift TQueryOptions fields; BE RuntimeState getters.

Parser/Analyzer/Registry:

Add named-arguments support for scalar functions: grammar (param => value), AST handling, validation (missing/duplicate/unknown/NULL), reordering and default filling.

Function registry/codegen updated to register named args/defaults for http_request.

BE Integration:

Register function (FID 30470), mark as returning random in function_call_expr, include in build.

Tests/Docs:

Extensive unit and SQL tests for http_request, security, network utils, HttpClient, and named-arg handling.

Add documentation for HTTP URL function usage.

^{Written by Cursor Bugbot for commit b1a2faf873385f3a5aa1497c9eb376fcdc85afb9. This will update automatically on new commits. Configure here.}

What type of PR is this:

[ ] BugFix
[x] Feature
[ ] Enhancement
[ ] Refactor
[ ] UT
[ ] Doc
[ ] Tool

Does this PR entail a change in behavior?

[ ] Yes, this PR will result in a change in behavior.
[x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

[x] Interface/UI changes: syntax, type conversion, expression evaluation, display information
[ ] Parameter changes: default values, similar parameters but with different default values
[ ] Policy changes: use new policy to replace old one, functionality automatically enabled
[ ] Feature removed
[ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

[x] I have added test cases for my bug fix or my new feature
[x] This pr needs user documentation (for new or modified features or behaviors)
- [x] I have added documentation for my new feature or new function
[ ] This is a backport pr

Bugfix cherry-pick branch check:

[ ] I have checked the version labels which the pr will be auto-backported to the target branch
- [x] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3

Dec 02 '25 08:12 EdwardArchive

What about the security considerations? Should we implement limitations on the domains?

@copilot how do you think about it

Dec 02 '25 10:12 murphyatwork

What about the security considerations? Should we implement limitations on the domains?

@copilot how do you think about it

Hi @murphyatwork

Thank you very much for your great feedback.

In my view, I also considered this approach, but concluded that relying on infrastructure-level defenses (Security Groups, Network Firewall, etc.) from a StarRocks perspective would reduce management complexity.

For now, I plan to add a variable that allows ADMIN accounts to enforce SSL certificate validation, preventing other users from calling URLs without SSL certificates.

For reference, when considering the cases of other databases: PostgreSQL and DuckDB approach:

PostgreSQL provides this only as function-level controls DuckDB allows control through configuration variables + external access denied variables

Enterprise-grade databases (Databricks, Snowflake, ClickHouse):

These maintain their own Network Rule Lists internally This appears to be because SaaS providers have control over both the application AND the infrastructure, allowing them to leverage this unified control

Thanks!

Dec 02 '25 11:12 EdwardArchive

I just fix almost case generated from copilot.

gensrc/thrift/InternalService.thrift:

Added url_ssl_verification_required option (field 201) to TQueryOptions

fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java:

Added Config.url_ssl_verification_required import
Added tResult.setUrl_ssl_verification_required() in toThrift() method
be/src/runtime/runtime_state.h:
Added url_ssl_verification_required() accessor method

be/src/exprs/url_functions.cpp:

Added #include "runtime/runtime_state.h"
Fixed simdjson error checks: !obj[...].get() → obj[...].get() == simdjson::SUCCESS
Added HttpClient reuse across rows for better performance
Added constant config caching with std::optional<UrlConfig>
Moved ColumnViewer creation outside loop
Added DELETE method body support
Changed url_prepare() to read SSL config from RuntimeState
Changed invalid config to return JSON error instead of NULL

be/src/exprs/url_functions.cpp:

Added is_valid_utf8() - Validates UTF-8 byte sequences, returns false for invalid encoding
Added is_valid_json() - Uses simdjson for proper JSON validation instead of simple string check
Modified build_json_response() - Returns JSON error response when body contains invalid UTF-8

be/test/exprs/url_functions_test.cpp:

Fixed prepareCloseTest - Only checks ssl_verify_required field (removed deprecated fields)

docs/en/sql-reference/sql-functions/scalar-functions/url.md:

Updated documentation with JSON config parameters and examples

Dec 02 '25 15:12 EdwardArchive

@cursor review

Dec 02 '25 17:12 alvin-celerdata

Thanks

Fix response size check to use streaming callback (using streaming it will return "Response size exceeds limit (1048576 bytes). Received: 5131466 bytes")
Fix timeout_ms integer overflow with bounds checking ( larger then 300,000 it'll be 300,000)
update url.md file

Dec 02 '25 17:12 EdwardArchive

@EdwardArchive Thanks for the contribution, after roughly investigating other systems' implementation, I have some suggestions. 2. please create a GitHub Issue for this new function, and in that place we need to get aligned on the interface of this function. 3. Because this function will access an external HTTP link, it may introduce potential risks to the system. Maybe we need to introduce some concepts like connection in Databricks. 4. please explain the background for this function in the issue ticket.

Dec 02 '25 18:12 alvin-celerdata

@EdwardArchive Thanks for the contribution, after roughly investigating other systems' implementation, I have some suggestions. 2. please create a GitHub Issue for this new function, and in that place we need to get aligned on the interface of this function. 3. Because this function will access an external HTTP link, it may introduce potential risks to the system. Maybe we need to introduce some concepts like connection in Databricks. 4. please explain the background for this function in the issue ticket.

Thanks! I just make issue on here https://github.com/StarRocks/starrocks/issues/66227

Dec 02 '25 19:12 EdwardArchive

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Dec 03 '25 02:12 sonarqubecloud[bot]

@cursor review

Dec 03 '25 04:12 alvin-celerdata

@cursor review

Dec 03 '25 17:12 alvin-celerdata

this will changed so delete old request

Dec 03 '25 18:12 EdwardArchive

Skipping Bugbot: Unable to authenticate your request. Please make sure Bugbot is properly installed and configured for this repository.

Dec 03 '25 18:12 cursor[bot]

@cursor review

Dec 04 '25 03:12 alvin-celerdata

Named Arguments Support for Scalar Functions

Summary

This PR implements Named Arguments support for scalar functions in StarRocks, using http_request() as the first function to leverage this feature.

Features

Named Arguments syntax: function(param => value, ...)
Default values: Optional parameters can be omitted
Positional calls: Traditional positional syntax still works
User-friendly error messages: Clear hints for common mistakes

Usage Examples

-- Named Arguments (any order, omit optional params)
SELECT http_request(url => 'https://api.example.com');
SELECT http_request(url => 'https://api.example.com', method => 'POST', body => '{}');
SELECT http_request(method => 'POST', url => 'https://api.example.com');  -- order doesn't matter

-- Positional Arguments (still supported)
SELECT http_request('https://api.example.com');

Error Messages

Scenario	Error Message
Missing required param	`http_request() required parameter 'url' is missing`
Unknown param (with hint)	`http_request() unknown parameter 'URL'. Did you mean 'url'?`
Duplicate param	`http_request() duplicate parameter 'url'`
NULL for required param	`http_request() required parameter 'url' cannot be NULL`
No arguments	`http_request() requires at least 1 argument(s). Missing required parameter(s): 'url'`

Implementation

How to Define a Named Arguments Function

In gensrc/script/functions.py:

[30470, 'http_request', True, False, 'VARCHAR',
 ['VARCHAR', 'VARCHAR', 'VARCHAR', 'VARCHAR', 'INT', 'BOOLEAN', 'VARCHAR', 'VARCHAR'],
 'HttpRequestFunctions::http_request',
 'HttpRequestFunctions::http_request_prepare', 'HttpRequestFunctions::http_request_close',
 {
     'named_args': [
         {'name': 'url'},                          # Required (no default)
         {'name': 'method', 'default': 'GET'},     # Optional with default
         {'name': 'body', 'default': ''},
         {'name': 'headers', 'default': '{}'},
         {'name': 'timeout_ms', 'default': 30000},
         {'name': 'ssl_verify', 'default': True},
         {'name': 'username', 'default': ''},
         {'name': 'password', 'default': ''}
     ]
 }],

Then run code generation:

cd gensrc/script && python3 gen_functions.py --java /path/to/fe/target/generated-sources/build

Files Changed

File	Changes
`gensrc/script/functions.py`	Added `named_args` metadata for `http_request`
`gensrc/script/gen_functions.py`	Added template for Named Arguments code generation
`fe/.../FunctionParams.java`	Added `reorderNamedArgAndAppendDefaults()`, `appendDefaultsForPositionalArgs()`
`fe/.../FunctionAnalyzer.java`	Added Named Arguments validation with user-friendly errors
`fe/.../ExpressionAnalyzer.java`	Added branching logic for Named vs Positional arguments

Architecture Flow

SQL: http_request(url => '...', method => 'POST')
         │
         ▼
┌─────────────────────────────────────┐
│ 1. Parser (StarRocks.g4)            │
│    Parse `param => value` syntax    │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 2. AstBuilder.java                  │
│    Create NamedArgument AST nodes   │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 3. FunctionParams.java              │
│    Separate exprs[] + exprsNames[]  │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 4. ExpressionAnalyzer.java          │
│    Branch: Named vs Positional      │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 5. FunctionAnalyzer.java            │
│    Validate & lookup function       │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 6. FunctionParams.java              │
│    Reorder args & append defaults   │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 7. Backend (C++)                    │
│    Receive all 8 columns            │
└─────────────────────────────────────┘

Backward Compatibility

No impact on existing functions: Only functions with named_args defined use this path
Positional calls still work: http_request('url') works as expected
Varargs functions excluded: Functions like concat() are not affected

Constraints

No mixed mode: All arguments must be either named or positional (not both)
Case-sensitive: Parameter names are case-sensitive (url ≠ URL)
Varargs not supported: Functions with variable arguments cannot use Named Arguments

Dec 04 '25 17:12 EdwardArchive

🧪 CI Insights

Here's what we observed from your CI run for 8cc37615.

🟢 All jobs passed!

But CI Insights is watching 👀

Dec 04 '25 18:12 mergify[bot]

@EdwardArchive

This PR implements Named Arguments support for scalar functions in StarRocks, using http_request() as the first function to leverage this feature.

This is great, but I would like you to separate it into two PRs. Please implement Named Arguments in a separate PR.

Dec 04 '25 18:12 alvin-celerdata

@alvin-celerdata Thanks, then I'll try to make separate it into two PRs.

Dec 04 '25 18:12 EdwardArchive

@cursor review

Dec 05 '25 17:12 alvin-celerdata

There are changes and more detail on SSRF feature

SSRF Protection Implementation Summary

Overview

This PR implements comprehensive Server-Side Request Forgery (SSRF) protection for the http_request() function with a 4-level security system and defense-in-depth architecture.

Security Levels

Level	Mode	Behavior
1	TRUSTED	Allow all requests (development only)
2	PUBLIC	Block private IPs, allow public hosts
3	RESTRICTED	Default - Require allowlist for all hosts
4	PARANOID	Block all requests unconditionally

Default: Level 3 (RESTRICTED) - Secure by default, requires explicit allowlist configuration.

Configuration Parameters

All parameters are runtime-mutable via ADMIN SET FRONTEND CONFIG:

-- Security level (1=TRUSTED, 2=PUBLIC, 3=RESTRICTED, 4=PARANOID)
ADMIN SET FRONTEND CONFIG ("http_request_security_level" = "3");

-- IP allowlist (exact match on IPv4)
ADMIN SET FRONTEND CONFIG ("http_request_ip_allowlist" = "192.168.1.1,10.0.0.1");

-- Host regex patterns (full string match)
ADMIN SET FRONTEND CONFIG ("http_request_host_allowlist_regexp" = "api\\.slack\\.com,.*\\.example\\.com");

-- Admin-enforced SSL verification (prevents user bypass)
ADMIN SET FRONTEND CONFIG ("http_request_ssl_verification_required" = "true");

-- Allow private IPs if in allowlist (NOT recommended for production)
ADMIN SET FRONTEND CONFIG ("http_request_allow_private_in_allowlist" = "false");

IP Blocking Implementation

IPv4 Private Ranges (6 ranges)

127.0.0.0/8 - Loopback
10.0.0.0/8 - Class A Private
172.16.0.0/12 - Class B Private
192.168.0.0/16 - Class C Private
169.254.0.0/16 - Link-local (Cloud Metadata)
0.0.0.0/8 - Current network

IPv6 Private Ranges (4 ranges)

::1/128 - IPv6 loopback
fc00::/7 - Unique local addresses
fe80::/10 - Link-local
::ffff:0:0/96 - IPv4-mapped addresses

Special: Cloud Metadata Protection

Dedicated detection for 169.254.169.254 (AWS/GCP/Azure metadata endpoints) with enhanced warning:

WARNING: Allowing this IP can expose cloud credentials and sensitive metadata.

Defense-in-Depth Architecture

8 Security Layers:

Protocol Validation - Only http:// and https:// allowed
Security Level Check - TRUSTED/PUBLIC/RESTRICTED/PARANOID
URL Parsing - Extract hostname, handle IPv6 brackets
DNS Resolution - Resolve ALL IPs (IPv4 + IPv6), not just first
Private IP Detection - Check all resolved IPs against private ranges
Allowlist Validation - IP exact match OR regex pattern match
SSL/TLS Enforcement - Admin can enforce SSL, preventing user bypass
Response Size Limiting - Max 1 MB response

Key Implementation Details

DNS Resolution Security

// Resolves hostname to ALL IPs (both IPv4 and IPv6)
// Then validates EVERY resolved IP against security rules
Status resolve_hostname_all_ips(const std::string& hostname,
                                std::vector<std::string>& ips);

Why: Prevents DNS rebinding attacks where attacker controls DNS to return private IPs after initial public IP validation.

SSL Verification Enforcement

// Two-level control:
// 1. User parameter: ssl_verify=>true/false
// 2. Admin enforcement: http_request_ssl_verification_required
if (admin_requires_ssl && user_requests_no_verify) {
    return ERROR("SSL verification is enforced by administrator");
}

Allowlist Logic

// IP allowlist (exact string match) OR host regex match
bool check_allowlist(host, resolved_ips) {
    for (auto& ip : resolved_ips) {
        if (ip_allowlist.contains(ip)) return true;  // IP match
    }
    return regex_match(host, host_patterns);  // Regex match
}

Production Configuration Example

Recommended Secure Defaults:

-- 1. Use RESTRICTED mode (require allowlist)
ADMIN SET FRONTEND CONFIG ("http_request_security_level" = "3");

-- 2. Enforce SSL verification globally
ADMIN SET FRONTEND CONFIG ("http_request_ssl_verification_required" = "true");

-- 3. Configure allowed public APIs only
ADMIN SET FRONTEND CONFIG ("http_request_host_allowlist_regexp" =
    "api\\.slack\\.com,hooks\\.slack\\.com,api\\.github\\.com");

-- 4. Keep private IP allowlist disabled (default)
-- "http_request_allow_private_in_allowlist" = "false"

Test Coverage

Comprehensive test suite with 40+ test scenarios in test/sql/test_http_request_function/T/test_http_request_security.sql:

Test Categories:

All IPv4 private IP ranges (127.x, 10.x, 172.16-31.x, 192.168.x, 169.254.x, 0.x)
All IPv6 private IP ranges (::1, fc00::/7, fe80::/10, IPv4-mapped)
Cloud metadata detection (169.254.169.254)
Security level transitions (1→2→3→4)
Allowlist matching (IP exact + regex patterns)
SSL verification enforcement
DNS resolution security (all IPs checked)
SSRF bypass attempts (decimal IPs, octal notation, etc.)

Industry Comparison

Feature	StarRocks	ClickHouse	Snowflake	Databricks
Default Security	RESTRICTED	RESTRICTED	NETWORK RULES	NETWORK POLICIES
Allowlist Support	IP + Regex	Hosts only	FQDN rules	FQDN + IP
Private IP Blocking	Default ON	Config-based	Always	Default
SSL Enforcement	Admin-enforced	User-controlled	Always	Always
DNS Resolution	All IPs	First IP only	All IPs	All IPs
Cloud Metadata Protection	Special detection	Blocked	Blocked	Blocked

StarRocks Advantages:

4-level security system (vs. binary ON/OFF)
Regex pattern matching (more flexible than exact domains)
Admin SSL enforcement (prevents user bypass)
Comprehensive IPv6 support with all private ranges
Special cloud metadata detection with enhanced warnings

Code Locations

Component	File	Lines
FE Config Parameters	`fe/fe-core/src/main/java/com/starrocks/common/Config.java`	4044-4071
Security Levels Enum	`be/src/exprs/http_request_functions.h`	30-35
Main Validation Logic	`be/src/exprs/http_request_functions.cpp`	344-457
IPv4 Private IP Check	`be/src/util/network_util.cpp`	198-228
IPv6 Private IP Check	`be/src/util/network_util.cpp`	230-259
DNS Resolution	`be/src/util/network_util.cpp`	291-333
URL Parsing	`be/src/util/network_util.cpp`	335-372
SSL Handling	`be/src/http/http_client.h`	93-96

Security Considerations

Why Default is RESTRICTED (Level 3)?

Secure by Default Principle:

No requests allowed without explicit configuration
Forces administrators to whitelist endpoints
Prevents accidental SSRF exposure
Follows industry best practices

Why Block Private IPs by Default?

SSRF Attack Vectors:

Internal service enumeration (scan internal APIs)
Cloud metadata access (steal IAM credentials)
Localhost bypass (access local services)
Data exfiltration (send to internal logging)

Why Special Link-Local Detection?

Cloud-Specific Risk:

AWS: http://169.254.169.254/latest/meta-data/
GCP: http://metadata.google.internal/
Azure: http://169.254.169.254/metadata/instance

Exposure: IAM credentials, API keys, instance metadata

Dec 06 '25 17:12 EdwardArchive

@alvin-celerdata Hi, Is there any though about SSRF feature?

Dec 07 '25 13:12 EdwardArchive

@cursor review

Dec 07 '25 18:12 alvin-celerdata

@cursor review

Dec 08 '25 03:12 alvin-celerdata

@cursor review

Dec 10 '25 17:12 alvin-celerdata

[FE Incremental Coverage Report]

:x: fail : 61 / 88 (69.32%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
:large_blue_circle:	com/starrocks/common/ConfigBase.java	16	40	40.00%	[256, 257, 258, 260, 261, 263, 264, 265, 267, 268, 272, 277, 278, 279, 281, 282, 284, 285, 286, 288, 289, 294, 306, 307]
:large_blue_circle:	com/starrocks/sql/analyzer/ExpressionAnalyzer.java	41	44	93.18%	[1107, 1108, 1110]
:large_blue_circle:	com/starrocks/catalog/FunctionSet.java	1	1	100.00%	[]
:large_blue_circle:	com/starrocks/sql/analyzer/FunctionAnalyzer.java	3	3	100.00%	[]

Dec 12 '25 03:12 github-actions[bot]

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

Dec 12 '25 03:12 github-actions[bot]

[BE Incremental Coverage Report]

:x: fail : 9 / 27 (33.33%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
:large_blue_circle:	src/exprs/function_call_expr.cpp	0	2	00.00%	[124, 125]
:large_blue_circle:	src/util/network_util.cpp	0	9	00.00%	[198, 201, 202, 203, 205, 206, 209, 210, 212]
:large_blue_circle:	src/http/http_client.h	0	4	00.00%	[99, 100, 101, 102]
:large_blue_circle:	src/runtime/runtime_state.h	2	5	40.00%	[381, 390, 391]
:large_blue_circle:	src/http/http_client.cpp	7	7	100.00%	[]

Dec 12 '25 03:12 github-actions[bot]

@cursor review

Dec 12 '25 04:12 alvin-celerdata

@cursor review

Dec 12 '25 07:12 alvin-celerdata

All committers have signed the CLA.

Dec 12 '25 21:12 CLAassistant

@cursor review

Dec 12 '25 21:12 alvin-celerdata

[Feature] URL Function Implementation

What I'm doing:

Executive Summary

Background & Motivation

Use Cases

1. Alerting & Notifications

2. Data Enrichment

3. Webhook Integration

Function Specification

Function Name

Function Signature

Parameters

Return Value

Security Design

SSRF Risk Analysis

Industry Comparison

Proposed Security Controls

Phase 1: Configuration-Based Security

Security Parameters

Private Network Blocking

Default Behavior (Secure by Default)

Configuration Management

Configuration Reference

Request Limits (Future Consideration)

Examples

Basic GET Request

POST with JSON Body

Dynamic Webhook from Query Results

Using with CONNECTION (Phase 2)

Named Arguments for Scalar Functions

Overview

Syntax

Features

User-Friendly Error Messages

Implementation Architecture

Defining a Named Arguments Function

Key Files Modified

Phase 2: CONNECTION Object (Summary)

Quick Reference

Key Benefits

Backward Compatibility

Compatibility Strategy

Migration Path

References

External Documentation

Appendix: Discussion Summary

Key Discussion Points with @alvin-celerdata

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

Quality Gate passed

Named Arguments Support for Scalar Functions

Summary

Features

Usage Examples

Error Messages

Implementation

How to Define a Named Arguments Function

Files Changed

Architecture Flow

Backward Compatibility

Constraints

🧪 CI Insights

🟢 All jobs passed!

SSRF Protection Implementation Summary

Overview

Security Levels

Configuration Parameters

IP Blocking Implementation

IPv4 Private Ranges (6 ranges)

IPv6 Private Ranges (4 ranges)

Special: Cloud Metadata Protection

Defense-in-Depth Architecture

Key Implementation Details

DNS Resolution Security

SSL Verification Enforcement

Allowlist Logic

Production Configuration Example