azure-cosmosdb-spark icon indicating copy to clipboard operation
azure-cosmosdb-spark copied to clipboard

Adding BulkSink for streaming writes

Open FabianMeiswinkel opened this issue 4 years ago • 0 comments

Customer is facing some latency issues because their streaming workload at steady-state is small (like about 4 documents/second) but during some periods of the day can goo to tens -of-thousand of documents - when the AsyncConnection based write stream implementation doesn't work fast and robust enough. An initial attempt to use readStream.forEachBatch and then write each micro batch to cosmos via batch write works but is showing higher latency for the small stead-state workload. From my own tests the latency with the BulkSink can be improved by 200-300ms (still about 200ms slower than point writes via Async Connection but that is about the expected ballpark).

FabianMeiswinkel avatar Feb 04 '21 10:02 FabianMeiswinkel