matrixone
matrixone copied to clipboard
[Feature Request]: Stage READ/WRITE support and restrict URL format
Is there an existing issue for the same feature request?
- [X] I have checked the existing issues.
Is your feature request related to a problem?
- Current stage implementation only supports SELECT INTO OUTFILE to write to a local file system but not S3. (see #17748)
- READ is not supported by Stage yet.
- the Stage URL should be restricted to URL format only, i.e. file:///, s3:// or stage://.
- Current supported URL format such as stagename:/path or simply a path should not be supported.
Describe the feature you'd like
- Able to import data from SQL "LOAD DATA INFILE" and External table and read from local file or S3.
- SELECT INTO OUTFILE accept stage URL as input filepath and write to local file or S3.
- stage URL can specify to a local file, s3 file and another stage with subpath.
- restrict the stage URL format by only supporting file:///, s3:// or stage://
Describe implementation you've considered
Stage URL format
The possible format of URL:
1. s3://<bucket>/<path> e.g. s3://bucket/path/to
2. file:///<path> e.g. file///path/to
3. stage://<stagename>/<path>
CREATE/DROP/SHOW Stage
CREATE STAGE [ IF NOT EXISTS ] { stage_name }
{ StageParams }
[ COMMENT = '<string_literal>' ]
DROP STAGE [IF EXISTS] { stage_name };
SHOW STAGES;
StageParams (for Amazon S3) :
URL = "s3://<bucket>[/<path>/]" CREDENTIALS = {"AWS_KEY_ID"='<string>', "AWS_SECRET_KEY"='<string>', "AWS_ROLE"='<string>', "AWS_TOKEN"='<string>', "AWS_REGION"='<string>', "COMPRESSION"='<string>', 'PROVIDER'='<string>', 'ENDPOINT'='<string>'}
StageParams (for File System) :
URL= 'file:///[/path/]'
StageParams (for sub-stage):
URL= "stage://<stagename>[/path/]"
External stages
Amazon S3
URL = s3://<bucket>[/path/]
Specifies the URL for the external location (existing bucket accessed using an S3 API endpoint) used to store data files, where:
-
is the name of the bucket -
is an optional case-sensitive path (or prefix in S3 terminogloy) for files in the cloud storage location (i.e. files with names that begin with a common string)
CREDENTIALS = {'key1'='value1', 'key2'='value2',...} Specifies the credentials for the external cloud storage
Mandatory keys are
- AWS_KEY_ID specifies the key ID provided by the storage
- AWS_SECRET_KEY specifies the secret key provided by the storage
- AWS_REGION specifies the region of the storage
- PROVIDER specifies the provider of the storage. Possible values are "amazon" and "minio".
- ENDPOINT specifies the host of the storage. For Amazon S3, set endpoint as empty string.
Examples
Amazon S3
CREATE STAGE my_stage URL='s3://bucket/data/files/'
CREDENTIALS={'AWS_KEY_ID'='1a2b3c', 'AWS_SECRET_KEY='4x6y6z', 'AWS_REGION'='en_US', 'PROVIDER'='amazon'}
Amazon S3-compatible storage (Minio)
CREATE STAGE my_stage URL='s3://bucket/data/files/'
CREDENTIALS={'AWS_KEY_ID'='1a2b3c', 'AWS_SECRET_KEY'='4x6y6z', 'AWS_REGION'='en_US', 'PROVIDER'='minio', 'ENDPOINT'='endpoint-host'}
Sub-Stage
Credential of the sub-stage will be inherited from the parent stage. Path conversion and credential look up from parent stage will be executed in execution time.
CREATE STAGE my_substage URL='stage://stagename/sub/path/';
File based stage
CREATE STAGE my_filestage URL='file:///path/to/somewhere/'
READ/WRITE
Loading from stage
LOAD DATA INFILE 'stage://dbname/my_stage/abc.csv' INTO TABLE my_table;
External Table
create external table t(...) INFILE '<string>' FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';
Example,
create external table ext_table INFILE 'stage://stagename/path/to/somewhere' fields terminated by ',' enclosed by '\"' lines terminated by '\n';
For Datalink support,
select load_file(" stage://stagename/path/abc.dat?offset=0&size=23");
Writing to the stage
SELECT * FROM my_table INTO OUTFILE 'stage://my_stage/data.csv';
Documentation, Adoption, Use Case, Migration Strategy
No response
Additional information
No response
test done