databend icon indicating copy to clipboard operation
databend copied to clipboard

bug: error on high ingest rate

Open webfrank opened this issue 3 years ago โ€ข 6 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Version

0.8.177

What's Wrong?

Hi have a kafka connect process which ingest around 1000/s records on a 6 node cluster. Data is not written to DB and I have this logs on the nodes:

ERROR common_meta_api::schema_api_impl: error: TableVersionMismatched: 24590 expect == 45807 but 45810 while update_table_meta

with version progressing on every row/node

How to Reproduce?

Kafka connect receives netflow data with a consistent rate of 1000/s records which need to be written on Databend.

I'm using MySQL interface to ingest data.

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

webfrank avatar Jan 08 '23 16:01 webfrank

This error is by design, because insert into the same table from cluster they need to race the snapshot lock from the metaservice, finally one will win and others will re-try, but the insert will works fine. We plan to change the log level from the error to warning cc @dantengsky

BohuTANG avatar Jan 09 '23 00:01 BohuTANG

Hi, but a the end data is not written into the db, so the insert is not working. Probably too high ingest rate. I also found databend does not support prepared statements and batching insert does not work.

Il giorno lun 9 gen 2023 alle 01:15 BohuTANG @.***> ha scritto:

This error is by design, because insert into the same table from cluster they need to race the snapshot lock from the metaservice, finally one will win and others will re-try, but the insert will works fine. We plan to change the log level from the error to warning cc @dantengsky https://github.com/dantengsky

โ€” Reply to this email directly, view it on GitHub https://github.com/datafuselabs/databend/issues/9519#issuecomment-1374970437, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDYWGAUH73PQXYY5HE2HRTWRNKCLANCNFSM6AAAAAATUUTCPE . You are receiving this because you authored the thread.Message ID: @.***>

webfrank avatar Jan 10 '23 07:01 webfrank

I also found databend does not support prepared statements and batching insert does not work

Hi, did you insert the data in one row per SQL ? It's recommended to insert the data in batches rather than single row per sql

  1. If you are using MySQL, you can concat a large SQL like insert into table(a,b,c) values (1,2,3), (11,22,33) .... to insert the data
  2. If you are using HTTP, you can use streaming load API to load the csv/json/parquet format data into databend. See https://databend.rs/doc/load-data/local

sundy-li avatar Jan 10 '23 07:01 sundy-li

I think Databend needs a real streaming connector like kafka.

Streaming load api is good if you have โ€œfilesโ€. Mysql concat insert is not practical if used inside an integration platform.

What really miss is a fast ingest endpoint/connector like kafka connect or influxdb wire protocol.

Il giorno mar 10 gen 2023 alle 08:52 sundyli @.***> ha scritto:

I also found databend does not support prepared statements and batching insert does not work

Hi, did you insert the data in one row per SQL ? It's recommended to insert the data in batches rather than single row per sql

  1. If you are using MySQL, you can concat a large SQL like insert into table(a,b,c) values (1,2,3), (11,22,33) .... to insert the data
  2. If you are using HTTP, you can use streaming load API to load the csv/json/parquet format data into databend. See https://databend.rs/doc/load-data/local

โ€” Reply to this email directly, view it on GitHub https://github.com/datafuselabs/databend/issues/9519#issuecomment-1376857959, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDYWGA2O2GYEW2ZJQQ6EQ3WRUINFANCNFSM6AAAAAATUUTCPE . You are receiving this because you authored the thread.Message ID: @.***>

webfrank avatar Jan 10 '23 13:01 webfrank

Hi, can you give more about your ingest case? Let's improve it.

BohuTANG avatar Jan 10 '23 14:01 BohuTANG

Thanks.

Streaming load api is good if you have โ€œfilesโ€.

We have supported ClickHouse HTTP API in https://databend.rs/doc/integrations/api/clickhouse-handler. You can put the data inside the body in supported format like:

Json:

echo -e '{"a": 1}\n{"a": 2}' | curl 'root:@127.0.0.1:8124/?query=INSERT%20INTO%20t1%20FORMAT%20JSONEachRow' --data-binary @-

CSV:

 echo -e '1\n2\n3' | curl 'root:@127.0.0.1:8124/?query=INSERT%20INTO%20t1%20FORMAT%20CSV' --data-binary @-

Streaming load did not require "files", you can put any data in HTTP body like examples

sundy-li avatar Jan 10 '23 14:01 sundy-li