storage icon indicating copy to clipboard operation
storage copied to clipboard

StorageEstimate.quota + Storage Pressure

Open jarryd999 opened this issue 4 years ago • 8 comments

We have been exploring the behavior around quota while under storage pressure: What value should we return for quota when the user has less available disk space than the quota we'd ordinarily return?

The obvious approach is to return a shrunken value which is less than or equal to the available disk space. The issue with allowing script to identify the remaining disk space is that a bad actor could query the value before and after caching an opaque response and determine it's size. (See https://github.com/whatwg/storage/issues/31) Since we don't want to do this, the spec should have a recommendation.

One solution is to return the same quota, regardless of storage pressure. The tradeoff is that apps will lose insight into their own remaining space. This can be addressed by providing a storage pressure API that would let apps know whether there is more/less than X MB/GB.

Thoughts?

jarryd999 avatar Jul 30 '19 17:07 jarryd999

The current Firefox/Gecko thinking here is:

  • (Versus storage pressure events) We strongly prefer exposing a mechanism to create multiple storage buckets to content where content can expose some eviction precedence.
  • Multiple buckets enables smaller quotas per bucket, including the default bucket. (Currently Gecko's default quota grant is overly generous.)
  • Multiple smaller buckets enables more granular eviction which should make it easier to ensure that we have the space available to fulfill the (smaller) quota that was promised to the origin.

So most directly addressing the issue, in such a hypothetical future:

  • Gecko would claim there was quota space available even if there wasn't sufficient free disk space to back existing promises.
  • Gecko would evict not-currently-in-use buckets from other origins (and potentially the current origin) in order to make good on the quota. This might cause some (detectable) asynchronous write delays as eviction progresses in the pathological case, but ideally Gecko would maintain a working margin that avoids this.
    • Depending on how the multiple buckets spec shakes out, any live windows with live buckets open might be notified if their buckets were evicted out from under them, but otherwise origins would not be notified about their eviction.

asutherland avatar Jul 30 '19 17:07 asutherland

Here is some common ground we found at TPAC 2019.

Use Case

A compelling use case is supporting the scenario where the user enables offline, the application starts syncing gigabytes of contents to the user's device and, during the sync, the browser notices that the disk is running low on free space.

On devices that run multiple applications at the same time, the browser cannot easily reserve space for the application at the beginning of the sync. The browser can tell the application to stop syncing early, and can give this signal early enough that the application can stop gracefully. The alternative is that writes start to fail with quota exceeded errors.

Mechanism

The signal will be a quotachange event on the StorageManager interface. This works today, and will work in a world where storage buckets inherit from StorageManager.

Still TBD is how available quota would change to reflect the fact that free disk space is running low. We can't expose the exact amount of free disk space to Web applications, because that would allow malicious apps to learn the size of cross-origin resources by writing them to Cache Storage.

@asutherland Please correct/flag anything I'm misremembering.

pwnall avatar Sep 26 '19 14:09 pwnall

That's consistent with my understanding.

I think we also discussed that it might make sense to generate courtesy events to update origins on when their usage crosses certain thresholds so they don't need to poll. These would want to be de-bounced space-wise so that oscillating around a certain size doesn't generate a large number of events and time-wise so that timing side-channels aren't accidentally created. For example a random delay before hooking the event up to the idle timeout or something. The goal would be to avoid accidentally exposing implementation details, including things like GC which is made evident by the collection of the last in-memory handle to a disk-backed Blob/File, etc.

To explicitly state the primary problematic scenario I remember from the discussion:

Problematic Scenario

  • Assume the browser maintains and persists some type of browser-local/user-profile-local (which may be synchronized with the user's other browser profiles via opt-in sync mechanism) site engagement/frecency metric that tracks the user's interaction with a site over time and this informs quota-related decisions.
  • The user browses to a site the browser has no site engagement data for the user for, or limited site engagement data for. This could be due to a new browser profile, because of privacy settings/data-clearing, or other.
  • The site wants to synchronize a large amount of data and the desired UX is that the user can make the decision to synchronize this amount of data up-front. For example, in an offline mail webapp, the user might choose to synchronize N weeks of data and be told the expected size. Alternately, the user might be using a video streaming site that allows saving videos for offline usage to be used from within the site, so not full videos.
  • The user is also synchronizing a large amount of data outside of the browser's quota management system and this cuts into space the browser thought it could use and allocate storage to.
  • The site can potentially extract some amount of entropy from either hitting a QuotaExceededError or from its quota being reduced.
  • Because we're dealing with a scenario that assumes low engagement/newly visited sites, even if the quotachange reduction has low entropy, an attacker could potentially perform this attack across multiple distinct origins in parallel or in serial in order to attempt to extract and aggregate additional entropy.
  • A bad actor can easily make its behavior more directly resemble that of a legitimate site if necessary to defeat simple heuristics. For example, if we require there be network usage commensurate with quota requests, an attacker is probably just as happy to use up the user's data. (That said, there are likely cases where some would-be fingerprinters like ad-tech would be dissuaded from stepping over certain lines that other bad actors would not.)

My hand-wavey proposals

Just for my own future reference, my general proposals for this area had been:

  • Favor a strategy of having sites make small incremental quota requests as space is used and they need more space. For example, in 50/100 MiB chunks. While there's entropy in when the browser starts saying no, this can be more directly tied to user engagement and rate-limiting.
    • This could allow the browser to surface an ambient indication that the page is doing a lot of work related to I/O and let the user stop the growth.
    • This makes it easier to evolve an understanding of user engagement as the synchronization happens. While a user may not want to watch the progress bar on the mail webapp or video streaming site, we would expect them to leave the tab open. This would differ from a random site the user lands on for a few seconds that installs a ServiceWorker, uses an in-page prompt to ask the user if it's okay to bother them with push notifications, and then uses the user interaction from the "No!" prompt or close ad prompt in order to count as user interaction and then request a very large quota grant and then fires an event at the ServiceWorker that the ServiceWorker does a never-resolving waitUntil() to get as much runtime as possible to listen to storage quotachange events. And presumably the site might also attempt to redirect to another top-level site where it repeats the process so it can have multiple ServiceWorkers each trying to gather some number of bits of entropy.
  • Strongly limited third party origins' quota grants.
  • Use explicit prompting UI for requests for large quota grants when there isn't already a strong site engagement score. The expectation is that the site would have primed the user for the prompt with their UX similar to how native apps and web sites requesting push notifications first use in-app/in-page explanations before prompting. This prompt would follow standard browser guidelines of not letting the site provide any information besides the quota grant request (which would also rounded and provided in human understandable units and contextualized in terms of total storage space or free space).
    • This is important because the larger the grant the more potentially entropy in any decrease in quota. So we want to limit the number of large, speculative grants.
    • The downside to this is that browser UX is frequently reluctant to prompt. However, this can be mitigated by only prompting when there is insufficient site engagement score or not prompting for Installed Web Apps.
    • Additionally, I think at this point all the major browser vendors provide and attempt to onboard new profiles with account sync which includes the data required to calculate site engagement, so in many cases site engagement data would already be available. And in the cases of users who explicitly do not use account sync or explicitly use the browser in configurations that purge such data periodically, it seems likely these users would indeed prefer prompting.
  • Allow (implicit?) quota grants related to APIs like background-fetch which potentially provide a good UX for users already by ambiently surfacing the fact that a download is happening and how much is being downloaded and allowing the user to cancel the download (and thereby revoke the tentative storage grant).

asutherland avatar Sep 26 '19 19:09 asutherland

FWIW, I'm not a 100% on making quota per-bucket rather than keeping it per-origin. The primary use case I see for buckets is aiding eviction, be it through priorities/importance or by making things more granular for end users. It's not clear to me how useful it is that a bucket has a particular limit, although I suppose there are some use cases where this might help.

annevk avatar Sep 27 '19 12:09 annevk

FWIW, I'm not a 100% on making quota per-bucket rather than keeping it per-origin.

One use case I'm aware of are sites that manage an opportunistic cache of resources in cache_storage. They implement their own eviction algorithm to keep the cache with the desired max size. Having some kind of bucket with a quota might help with this use case.

However, these sites also generally don't want to wipe just fail writes when they hit the limit or cause the entire bucket to get evicted. They want LRU eviction of some resources to reduce the size. Unless we add that kind of policy to buckets (which it seems we're unsure of) then maybe this use case is not really helped.

wanderview avatar Sep 27 '19 13:09 wanderview

@annevk A recurring use-case in the Service Workers/Storage space is that a single origin may be divided up into sub-sites each handled by largely independent teams. If each team defines their own buckets and those buckets have their own quota, this helps the team reason about and control their storage usage. It also helps the browser apportion quota increases based on the sub-sites the user actually uses.

For example, imagine a site with a news feed that aggressively (pre)caches which the user barely uses, plus a photo album that opportunistically caches and which the user uses all the time. They each use their own bucket. If they share the same quota, the news feed might see there is spare quota and use up all the quota for its own purposes the user doesn't care about, while the photo caching bucket remains effectively the same size. Should the two sub-sites have to coordinate between themselves on how to allocate their quota, or should the browser be doing it for them via bucket quotas?

asutherland avatar Sep 29 '19 08:09 asutherland

I don't think the browser has a good track record with managing quota so I guess I'd rather not expand our scope on that front. And also, the site teams will have to play by some rules anyway as otherwise the news feed folks would just relay that their bucket is vital infrastructure and cannot be wiped without wiping all.

annevk avatar Sep 30 '19 07:09 annevk

has there been any progress on this? With AccessHandles(https://web.dev/file-system-access/#accessing-files-optimized-for-performance-from-the-origin-private-file-system) integration coming in chrome 99, and people using storage more heavily for larger and larger files... we really need some good indications of how much data we can store on the device, not just quota.

For instance: Replicating a Notes database locally to your client, it would be nice to show them how much free space they have so they know if it will fit. If we need an installed PWA for that particular use case, then fine... but the behavior when it is not installed should not leave the user experience broken.

hcldan avatar Jan 17 '22 16:01 hcldan