fhir-bulk-data-docs icon indicating copy to clipboard operation
fhir-bulk-data-docs copied to clipboard

Add parameter to control the file size

Open vlad-ignatov opened this issue 6 years ago • 0 comments

This PR is not final! I hope to clarify the text after some discussion

Notes:

  1. The servers are expected to have their own default limit. Thew should validate this parameter and only apply it if it is within reasonable boundaries. For example:
    • a server might not be able to generate files bigger than 10G and in that case _pageSize=1T should be considered invalid.
    • Trying to set the limit to very low value might result in huge file list generated by the status endpoint which is also not desirable.
  2. The server should decide weather to support _pageSize=0 based on the amount of data that it has. If that is not supported, the server should reject such requests, instead of silently ignoring the parameter.
  3. When using a file size based limit, the clients should be aware that the result might be approximate. Because the servers will stream the data in chunks, they will not know if they have reached the the size limit until they actually exceed it. That is why _pageSize=100M will probably produce a file with size equal to 100M plus the size of some portion of the last resource.
  4. About the count-based limiting
    1. It will obviously produce variable size files but with consistent length. In some cases, the client might be part of a specific pipeline for which the resource count is more important then the file size.
    2. So far, there are two types of bulk-data server implementations
      1. Most will probably generate files that clients will then download
      2. Some will not create files but generate and stream them on the fly from the download endpoint. Such servers compile a list of file links based on the count limit (either _pageSize=number or internal limit) and then return that list from the status endpoint. These servers will not be able to generate the download links based on a file-size limit.
  5. The _pageSize is optional. Servers who don't support it should silently ignore it. However, those who do must return an error if the passed _pageSize value is not acceptable.

Questions

  1. We need a name, generic enough to fit both the count and size based limit. I came up with _pageSize but I am not sure if that is the best one.
  2. Do you think the _pageSize=number syntax is confusing (looking like size in bytes)?

vlad-ignatov avatar Jan 16 '19 19:01 vlad-ignatov