fhir-bulk-data-docs
fhir-bulk-data-docs copied to clipboard
Add parameter to control the file size
This PR is not final! I hope to clarify the text after some discussion
Notes:
- The servers are expected to have their own default limit. Thew should validate this parameter and only apply it if it is within reasonable boundaries. For example:
- a server might not be able to generate files bigger than 10G and in that case
_pageSize=1Tshould be considered invalid. - Trying to set the limit to very low value might result in huge file list generated by the status endpoint which is also not desirable.
- a server might not be able to generate files bigger than 10G and in that case
- The server should decide weather to support
_pageSize=0based on the amount of data that it has. If that is not supported, the server should reject such requests, instead of silently ignoring the parameter. - When using a file size based limit, the clients should be aware that the result might be approximate. Because the servers will stream the data in chunks, they will not know if they have reached the the size limit until they actually exceed it. That is why
_pageSize=100Mwill probably produce a file with size equal to 100M plus the size of some portion of the last resource. - About the count-based limiting
- It will obviously produce variable size files but with consistent length. In some cases, the client might be part of a specific pipeline for which the resource count is more important then the file size.
- So far, there are two types of bulk-data server implementations
- Most will probably generate files that clients will then download
- Some will not create files but generate and stream them on the fly from the download endpoint. Such servers compile a list of file links based on the count limit (either _pageSize=number or internal limit) and then return that list from the status endpoint. These servers will not be able to generate the download links based on a file-size limit.
- The
_pageSizeis optional. Servers who don't support it should silently ignore it. However, those who do must return an error if the passed _pageSize value is not acceptable.
Questions
- We need a name, generic enough to fit both the count and size based limit. I came up with
_pageSizebut I am not sure if that is the best one. - Do you think the
_pageSize=numbersyntax is confusing (looking like size in bytes)?