greptimedb
greptimedb copied to clipboard
Limiting total rows copied in `COPY TABLE FROM` statement
What problem does the new feature solve?
As pointed out by this article, a 42 KiB parquet file may contain hundreds of trillions of values, just like what zip bomb does. If GreptimeDB does not limit the total rows copied in single COPY TABLE FROM
statement, these parquet bombs may immediately overload the backend storage.
What does the feature do?
Limit total row imported by single COPY TABLE FROM
statement.
The execution of copy statements will build a record batch stream on file and insert the batches yielded to mito engine.
https://github.com/GreptimeTeam/greptimedb/blob/3acd5bfad0ab5f0786ce9206e9cb8dd1a7cc5890/src/operator/src/statement/copy_table_from.rs#L402-L437
We can add a config option to limit the total rows read from the record batch stream and terminate the execution once it exceeds threshold.
I'd like to take it, Could you please assign it to me?
I'd like to take it, Could you please assign it to me?
Sure. Are you going to add a new runtime option to limit the rows?
Sure. Are you going to add a new runtime option to limit the rows?
Sure, should we add a configuration option in the OptionMap in CopyTableArgument to limit it?
Sure. Are you going to add a new runtime option to limit the rows?
Sure, should we add a configuration option in the OptionMap in CopyTableArgument to limit it?
Maybe we can set a default limit, for example 1000 rows, unless user explicitly specified the limit in COPY
statement.