azure-cosmosdb-spark icon indicating copy to clipboard operation
azure-cosmosdb-spark copied to clipboard

Spark streaming should do checkpointing in cosmos db collection instead of HDFS

Open nomiero opened this issue 5 years ago • 0 comments

Currently, the spark connector uses HDFS to save the checkpointing information and that causes race conditions where the location can get corrupted and the job has to be restarted. The connector needs to move checkpointing to use cosmos db collection for state management.

nomiero avatar Feb 15 '19 20:02 nomiero