azure-cosmosdb-spark
azure-cosmosdb-spark copied to clipboard
Support readchangefeed from a certain point in time
We currently have support for reading changefeed starting from the beginning or from current time. We need to support also starting from any point in time.
I feel the checkpointLocation feature can be used for this. I am not clear on a few things though. Where and in what format is the checkpoint location stored? Can we use the last saved checkpoint to restart from that point?
The checkpoint is currently saved in HDFS, this will change soon though. And yes, if you stop the job and start it again with the same checkpointing location, it will continue from the same point. The problem now is that if you want to read documents from the last 3 days, you will have to go as far as populating the location yourself which is not a trivial thing to do. We are looking into making this experience better.