connectors icon indicating copy to clipboard operation
connectors copied to clipboard

Allow configuration to batch sync progress log entries

Open ppf2 opened this issue 1 year ago • 1 comments

In environments where a large number of documents are synced quickly (e.g. 150M+ documents), we are currently writing 150M+ log entries like the following at the INFO level in our connectors logs:

Sync progress -- created: 153529100 | updated: 0 | deleted: 0

1 log entry for every 100 documents.

It can be helpful to provide a configuration users can use to "batch up" these log messages. For example, the user could configure 10000 as the sync progress log batch number so that it will only write a sync progress log entry every 10K documents instead of 100. We can make this new configuration available at the per-connector level.

ppf2 avatar Mar 19 '24 19:03 ppf2

Thanks to @wangch079 it looks like we do currently have a configuration at the Elasticsearch sink level elasticsearch.bulk.display_every 🎉 It could be a nice enhancement to make this configurable at the per-connector level so that users can configure a different display_every based on the volume of data the specific connector is syncing.

ppf2 avatar Mar 20 '24 18:03 ppf2