deploy_feedback icon indicating copy to clipboard operation
deploy_feedback copied to clipboard

Storage API for persistence

Open vwkd opened this issue 2 years ago • 5 comments

Currently, there is no way to persist data in a Worker (pending name #105). Instead one has to leave the edge to use a third party like Fauna. It’s a common use case to want to persist data in the Worker itself [^common]. For example, Cloudflare offers a KV database with Workers KV [^whatiskv] and more recently a SQL database with D1 [^whatisd1]. It would make sense to add a Storage API and it seems IndexedDB is already under consideration [^indexeddb]. Though there are still open questions about the implementation. It would be great if Deno Deploy can get this right from the start.

Synchronization

Using a Storage API, a Worker instance can persist data between runs. But since a Worker has multiple instances now an instance could have different state from the other instances resulting in the user getting different results depending on location. This is useful as a cache but not as a storage (see also #74). The writes to the storage of one instance need to be synchronized with the storage of the other instances. Reads however can use the storage on the instance directly. [^kvsync] [^kvsync2]

The synchronization can’t just send data directly between the instances (e.g. BroadcastChannel) because for writes at the same time on different instances their states could diverge because there is no consensus on order. There needs to be a single point of coordination that serializes the writes. The instances send the writes to it first and it broadcasts it back to all instances which guarantees that the writes on the instances are in the same order everywhere. The central point can also store the writes itself which makes it a backup database or even the main database if locations store only frequently used values like in Workers KV [^kvarch].

One could build this synchronisation from scratch which is also what CF did with Workers KV. But sooner than later you'll realise that storage in a Worker has limited use cases since it's only eventually consistent due to the synchronisation. Eventual consistency is good for reads but not for writes. Writes can lead to inconsistencies and data loss like when writing at the same time at different locations [^kvec1] [^kvec2] [^kvec3] [^kvec4] [^kvec5]. Writes can only be safely issued from a single location on the outside via CLI / UI [^ kvexternal].

For strongly consistent storage in a Worker, there needs to be a single point of coordination. You might then invent a single-instance private FaaS to allow users to do that (pending name “Coordinator” #88). Note, the storage in a Coordinator itself doesn’t need synchronization since a Coordinator only ever runs in a single instance. The Storage API of a Coordinator can be identical to that of a Worker. The only difference is the synchronization of a Worker’s storage between the instances under the hood. There is no reason to have two different Storage APIs depending on the type of instance - Worker or Coordinator - like CF did (see also #88). Unfortunately even with D1 Cloudflare seems to still not realize to provide it for Durable Objects too, such that there is an identical storage API across all FaaS.

But with Coordinators you now have invented a single point of coordination exactly like you needed it for the Worker storage synchronisation, just that it's a first-class user product as well. Therefore you naturally want to dogfood and also build the synchronisation of the Worker storage using a Coordinator. This is what CF now does with Workers KV after having invented Durable Objects [^kvdo] [^kvdo2]. With D1 it built it on Durable Objects [^d1do] from the start.

The takeaway here is that you'll save yourself time and money if you first invent Coordinators (where storage doesn't need synchronisation between instances), which you then use to implement the synchronisation of the Worker storage.

Concurrency

If not designed well, an asynchronous Storage API can lead to subtle bugs and race conditions for multiple concurrent requests to the Worker / Coordinator instance. CF has a write up here that goes into great detail: Durable Objects: Easy, Fast, Correct — Choose three.

This is equally true for both the Worker and the Coordinator Storage API, since it doesn’t matter where the concurrent requests to an instance come from - from users in the open Internet (Worker instance) or from other Worker / Coordinator instances in the Deno Deploy network (Coordinator instance). CF seems to have realised the issue only after having built Workers KV already, hence unfortunately creating a new Storage API in Durable Objects. If you compare the Storage API in Durable Objects ^doapi to the Workers KV API in Workers ^kvapi you can see the difference.

Encryption

When storing data one likely wants to think about encryption as well [^kvenc] [^kvenc2].

[^common]: „we quickly realized — most real world applications are stateful.“ Source: https://blog.cloudflare.com/introducing-d1/

[^indexeddb]: Source: https://discord.com/channels/684898665143206084/826085979344470037/886971068345122846

[^whatiskv]: Actually, Workers KV is storage local to a location instead of local to a Worker which is why there need to be IDs, bindings, and config files. But from the point of a single Worker it works the same. Source: https://github.com/denoland/deploy_feedback/issues/76#issuecomment-939579492

[^whatisd1]: „D1: our first SQL database.“. Source: https://blog.cloudflare.com/introducing-d1/

[^d1do]: „we're building [D1] on the redundant storage of Durable Objects.“. Source: https://blog.cloudflare.com/introducing-d1/

[^kvsync]: "Changes are immediately visible in the edge location at which they're made, but may take up to 60 seconds to propagate to all other edge locations.". Source: https://developers.cloudflare.com/workers/learning/how-kv-works

[^kvsync2]: “Writes are immediately visible to other requests in the same edge location, but can take up to 60 seconds to be visible in other parts of the world.”. Source: https://developers.cloudflare.com/workers/runtime-apis/kv#parameters

[^kvarch]: “Very infrequently read values are stored centrally, while more popular values are maintained in all of our data centers around the world.”. Source: https://developers.cloudflare.com/workers/learning/how-kv-works

[^kvec1]: “If two clients write different values to the same key at the same time, the last client to write eventually "wins" and its value becomes globally consistent.”. Source: https://blog.cloudflare.com/workers-kv-is-ga/

[^kvec2]: “[..] if a client writes to a key and that same client reads that same key, the values may be inconsistent for a short amount of time.”. Source: https://blog.cloudflare.com/workers-kv-is-ga/

[^kvec3]: “Due to the eventually consistent nature of Workers KV, concurrent writes from different edge locations can end up overwriting one another.”. Source: https://developers.cloudflare.com/workers/runtime-apis/kv#parameters

[^kvec4]: “[..] it implements "last write wins" semantics, which means that if a single key is being modified from multiple locations in the world at once, [..] those writes [..] overwrite each other.”. Source: https://blog.cloudflare.com/introducing-workers-durable-objects/

[^kvec5]: “Workers KV isn’t ideal for situations where you need support for atomic operations or where values must be read and written in a single transaction.”. Source: https://developers.cloudflare.com/workers/learning/how-kv-works

[^kvexternal]: “It’s a common pattern to write data via [..] the API but read the data from within a worker, avoiding this issue by issuing all writes from the same location.”. Source: https://developers.cloudflare.com/workers/runtime-apis/kv#parameters

[^kvdo]: “[..] you could build KV on top of Durable Objects, by implementing your own caching and replication in application logic running in Durable Objects.”. Source: https://news.ycombinator.com/item?id=25087987

[^kvdo2]: “Going forward, we plan to utilize Durable Objects in the implementation of Workers KV itself, in order to deliver even better performance.”. Source: https://blog.cloudflare.com/introducing-workers-durable-objects/

[^kvenc]: “All values are encrypted at rest with 256-bit AES-GCM, and only decrypted by the process executing your Worker scripts or responding to your API requests.”. Source: https://developers.cloudflare.com/workers/learning/how-kv-works

[^kvenc2]: “The values written are encrypted while at rest, in transit, and on local disk; they are only decrypted as needed.”. Source: https://blog.cloudflare.com/introducing-workers-kv/

vwkd avatar Oct 13 '21 19:10 vwkd

Persistent storage is something we want to add eventually (it's on my personal todo list) but there's no ETA, it's a Hard Problem.

bnoordhuis avatar Oct 14 '21 11:10 bnoordhuis

Is there a specific reason that the Web Storage APIs are not supported in Deno Deploy? Deno itself has supported them since 1.10..

A no-dependency persistent key-value pair would be really nice in the serverless environment.

Honestly, I’ve hit several problems using localStorage in Deno: first in compiled executables and now Deno Deploy.

andrew-pyle avatar Oct 18 '21 03:10 andrew-pyle

@andrew-pyle Yes, they are intentionally not supported: a global KV store would be cool, but localStorage specifically would not work because it is a synchronous API. We have plans for APIs to do storage, but no ETA.

lucacasonato avatar Oct 18 '21 07:10 lucacasonato

@lucacasonato I'm not sure this is the next place to bring it up but I could see localStorage working well in Deno Deploy. Each region has its own storage and just like in a browser it is synchronous and under the hood the browser is persisting it to disk but what is in memory is consistent and encapsulates that fact. I think the Cache API could be implemented and work the same (separate instance per region). If a multi-region app needs to sync data across regions then I see various strategies that could be used. e.g. Have the regions pick a main region to all write to via BroadcastChannel. I think making region isolates work like workers in separate browsers the can communicate with one another can make things simple yet powerful. My 2 cents.

mfulton26 avatar Jan 29 '22 04:01 mfulton26

I used to think I wanted IndexedDB in Deno. I'm not crazy about its API but WebSQL is no more and there hasn't been other options.

One newer option though is the File System API, specifically the Origin Private File System which gives an origin an isolated space to create files. Implementing it wouldn't even require the entire File System API. Arbitrary disk access isn't desirable in Deno Deploy but I think origin private file system would be.

iOS 15 ships it and I think it will come to Android here in not too long (not full File System API but just the origin private FS, see https://bugs.chromium.org/p/chromium/issues/detail?id=1352738#c2).

Chrome and Safari have shipped the FS API on desktops for awhile now.

I think this might make a great addition to Deno and Deno Deploy and would be much less complex than implementing IndexedDB.

mfulton26 avatar Aug 19 '22 01:08 mfulton26

Any update here ?

pierredewilde avatar Oct 31 '22 10:10 pierredewilde

Is there any news here?

thomas3577 avatar Dec 05 '22 20:12 thomas3577

Isn't this solved now with Deno KV ?

bobmoff avatar May 14 '23 09:05 bobmoff

Deno KV offers a well designed Storage API for persistence which also solves the questions from the initial posting. It's currently still in closed beta, but it seems we can close this, hoping it will become generally available soon.

vwkd avatar Jun 04 '23 23:06 vwkd