doris-spark-connector
doris-spark-connector copied to clipboard
[Improvement] no need to wait after a successful retry
Problem Summary:
When using Utils.retry
, even if it succeeds, we still need to wait for doris.sink.batch.interval.ms
milliseconds.
This pr is to fix it.
Checklist(Required)
- Does it affect the original behavior: (No)
- Has unit tests been added: (No)
- Has document been added or modified: (No Need)
- Does it need to update dependencies: (No)
- Are there any changes that cannot be rolled back: (No)
This configuration is to prevent exceptions caused by too frequent imports. What problems will the pause between batches cause to your job?
This configuration is to prevent exceptions caused by too frequent imports. What problems will the pause between batches cause to your job?
Got it, thx. @gnehil If we increase the retry interval, we will wait a long time after each insertion. Maybe it would be better to split it into two parameters?
This configuration is to prevent exceptions caused by too frequent imports. What problems will the pause between batches cause to your job?
Got it, thx. @gnehil If we increase the retry interval, we will wait a long time after each insertion. Maybe it would be better to split it into two parameters?
You can reduce the batch loading interval by setting the doris.sink.batch.interval.ms option. The default value of this option is 50 (ms). Or you can set it to 0, so there will be no interval between batches. And can you briefly describe the idea of "split into two parameters"?
@gnehil I means providing two parameters:
-
doris.sink.batch.interval.ms
: Control the batch flush interval -
doris.sink.retry.interval.ms
: Control the retry interval
When the retry interval is increased, it will not affect the batch flush interval.
@gnehil I means providing two parameters:
doris.sink.batch.interval.ms
: Control the batch flush intervaldoris.sink.retry.interval.ms
: Control the retry intervalWhen the retry interval is increased, it will not affect the batch flush interval.
Good idea, you can submit PR for this
cc @gnehil
Iterator retry will lose data, refer to pr https://github.com/apache/doris-spark-connector/pull/145