greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Limiting total rows copied in `COPY TABLE FROM` statement

Open v0y4g3r opened this issue 10 months ago • 4 comments

What problem does the new feature solve?

As pointed out by this article, a 42 KiB parquet file may contain hundreds of trillions of values, just like what zip bomb does. If GreptimeDB does not limit the total rows copied in single COPY TABLE FROM statement, these parquet bombs may immediately overload the backend storage.

What does the feature do?

Limit total row imported by single COPY TABLE FROM statement.

The execution of copy statements will build a record batch stream on file and insert the batches yielded to mito engine.

https://github.com/GreptimeTeam/greptimedb/blob/3acd5bfad0ab5f0786ce9206e9cb8dd1a7cc5890/src/operator/src/statement/copy_table_from.rs#L402-L437

We can add a config option to limit the total rows read from the record batch stream and terminate the execution once it exceeds threshold.

v0y4g3r avatar Apr 16 '24 03:04 v0y4g3r

I'd like to take it, Could you please assign it to me?

irenjj avatar Apr 19 '24 03:04 irenjj

I'd like to take it, Could you please assign it to me?

Sure. Are you going to add a new runtime option to limit the rows?

v0y4g3r avatar Apr 19 '24 07:04 v0y4g3r

Sure. Are you going to add a new runtime option to limit the rows?

Sure, should we add a configuration option in the OptionMap in CopyTableArgument to limit it?

irenjj avatar Apr 20 '24 02:04 irenjj

Sure. Are you going to add a new runtime option to limit the rows?

Sure, should we add a configuration option in the OptionMap in CopyTableArgument to limit it?

Maybe we can set a default limit, for example 1000 rows, unless user explicitly specified the limit in COPY statement.

v0y4g3r avatar Apr 22 '24 02:04 v0y4g3r