CosmoStore icon indicating copy to clipboard operation
CosmoStore copied to clipboard

Azure Table Storage Change Feed?

Open deyanp opened this issue 5 years ago • 13 comments

Hi,

I saw the comment that the author moved from Cosmos DB to Azure Table Storage due to high costs with the former. How do you push the data to the Read Model though, I couldnt find Change Feed or similar for Table Storage ... and polling doesnt sound workable ...

Best regards, Deyan

deyanp avatar Apr 17 '19 06:04 deyanp

Hi @deyanp,

there is IObservable of appended event as part of CosmoStore instance - see https://github.com/Dzoukr/CosmoStore/blob/master/src/CosmoStore/CosmoStore.fs#L59

Another (and also widely used by us) approach is to compose functions Cmd -> Event list with Event list -> Event list (doing side-effect writing to projection database). It is matter of taste, someone doesn't like reactive approach, someone does.

Dzoukr avatar Apr 17 '19 06:04 Dzoukr

Hi,

Does this mean that the projections get built "in-process", without any guarantee in case the process crashes after writing to the event stream?

Best regards, Deyan

deyanp avatar Apr 17 '19 06:04 deyanp

Yes, it would have to happen just between writing to event store & projection database, but it can theoretically happen and you would have to do replay of missing events in such case. Or you can plug queue in-between and write projections in separate process / application. AFAIK there is no Change feed for Table Storage so it is up to you how to lower the risks of eventual consistency.

Dzoukr avatar Apr 17 '19 07:04 Dzoukr

Yep, this is the problem I am facing ... writing to a queue is not a solution, as I cannot (and dont want to) open a distributed transaction between Azure Table Storage and Azure Event Hub for example ..

What issues with the costs of Cosmos DB did you face exactly (if I may ask), and do you think there is a solution to them?

deyanp avatar Apr 17 '19 07:04 deyanp

Well, the pricing of Cosmos DB scales differently. If you need to start "low" (imagine weekend project) with few events stored, few aggregates, you still need to have 400 RU/s as current minimum. And such minimum is still expensive as hell comparing to Azure Table Storage where you pay mostly for space, which is negligible.

To make it clear, I still love Cosmos DB - amazing product, but until MS will change pricing to be more friendly for low-cost/weekend projects, it will be product chosen mainly by bigger companies.

Dzoukr avatar Apr 17 '19 08:04 Dzoukr

Thank you for sharing your concerns, now I understand better. I am thinking of using

  1. Cosmos DB for the write side (taking advantage of the Change Feed)
  2. Azure Table Storage for a) the read side (duplicate denormalized projections) b) duplicating all events from Cosmos DB to Azure Table Storage for replay purposes, assuming reading all events directly from Cosmos DB would incur a lot of RUs/costs c) aggregate snapshots (last state of aggregate, not to have to read and replay all old events)

Alternatively I was thinking about Azure PostgreSQL for 2a), as Azure SQL Database seems to be much more expensive ...

What do you think about the above approach?

deyanp avatar Apr 17 '19 08:04 deyanp

reading all events directly from Cosmos DB would incur a lot of RUs/costs

That is the funny part. If your Cosmos DB collection has 400 RUs, you just pay for it. Constantly. No matter if you use it or not.

Otherwise it looks ok - let me know how it works.

Dzoukr avatar Apr 17 '19 08:04 Dzoukr

@deyanp I independently arrived at the same architecture you described (namely CosmosDB for writes and Azure Table Storage for denormalized views, changefeed duplication, and snapshots). I arrived here after googling "Azure Table Storage change feed" :) I haven't implemented anything yet, just theorycrafting my own pet project.

How did your project turn out?

dharmaturtle avatar Oct 11 '20 15:10 dharmaturtle

Slight tangent but... I'd be interested to see how you represent the events and/or manage efficient idempotent writing to azure tables (the thing termed 'changefeed duplication' above)

I suspect that forking Propulsion.Cosmos.Sink might be a good way to scale the archival process. In the proArchiver template (complete, but unmerged in https://github.com/jet/dotnet-templates/pull/79), I duplicate events from the primary out to CosmosDB (see in-depth discussion of my rationale).

bartelink avatar Oct 11 '20 16:10 bartelink

@deyanp I independently arrived at the same architecture you described (namely CosmosDB for writes and Azure Table Storage for denormalized views, changefeed duplication, and snapshots). I arrived here after googling "Azure Table Storage change feed" :) I haven't implemented anything yet, just theorycrafting my own pet project.

How did your project turn out?

@dharmaturtle , as many things in life, this one also turned into a different direction: MongoDB for writes and some reads, and Azure Data Explorer (ADX) for DWH/Reporring/more complicated reads.

Cosmos DB surprised me a bit negatively - everything must be partitioned, bloated storage (200 bytes turn into 900 bytes somehow, and you pay for uncompressed storage) and what I need very much - missing atomic updates ...

ADX is sth I recommend a lot, MongoDB has its quirks ..

deyanp avatar Oct 11 '20 21:10 deyanp

and what I need very much - missing atomic updates ...

what about the batch APIs ? can stored procs to the job (in general you should be able to get it done with the bulk apis though [unless you have specific things that really benefit from being able to gain efficiency with reduced roundtrips])

Re that per doc overhead, I can definitely concur (which is why equinox packs events into docs, it seems that ~30k is the sweet spot though there are lots of factors to consider)

bartelink avatar Oct 11 '20 22:10 bartelink

@bartelink , neither sprocs nor anything else helps I am afraid. I need to update a shared account balance multiple times per second in parallel (e.g. 20x), and I cannot at all afford any optimistic concurrency exceptions. I have looked at stored procedures and under the hood they also do optimistic locking stuff .. so no way that I found, unfortunately :(

They say they support MongoDB's API (even though 3.2/3.6, which is outdated) and findOneAndModify/Update in particular (which is atomic, with $set, $inc etc commands) but even though I asked (see https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/38110195-support-for-atomic-updates-in-sql-api) they did not confirm and I am afraid also there under the hood there is some optimistic concurrency going on ...

deyanp avatar Oct 15 '20 07:10 deyanp

@deyanp I'd be surprised if the Cosmos MongoDB interface offers any increment on native functionality. I agree the bulk facility is covering a very different use case

Not sure if it's remotely useful but in Equinox.Cosmos we solved a similar problem via:

  • the Equinox Sync stored proc yields the conflicting state if there is a conflict (which does not turn into an exception or necessitate another roundtrip to sync with the state)
  • In the app layer, use AsyncBatchingGate to gather concurrent requests into a single roundtrip - i.e. if 5 inc operations need to happen concurrently, send them via the batching gate, aggregate them into a single request and then have each caller share that fate.

In some cases, you can stack the requests up in some form of queue or bus (which also)

If you're literally only looking to do an inc operation, the bottom line is that at CosmosDB level there simply has to be a read, then an update followed by an etag-contingent update - you can rig it such that in the failure case you recurse within the stored proc

bartelink avatar Oct 15 '20 08:10 bartelink