tiled icon indicating copy to clipboard operation
tiled copied to clipboard

Support for reading and writing data as simple "bytes"

Open padraic-shafer opened this issue 1 year ago • 1 comments

Background

From Issue #434: Tiled's data model constrains everything to be one of its recognized structure families (array, dataframe, sparse, ~node~ container) or JSON-encodable metadata sitting alongside one of those types. There will be cases where there is binary (not JSON-encodable) information that is relevant and that some clients programs will know what to do with.

Proposed Changes

Tiled should support serving assets as a stream of bytes, so that clients can download files (or other data streams) that do not readily fit into Tiled's other structure families. The metadata may include additional hints (such as MIME type) that help the client interpret the payload.

Below are several suggested changes that arose from the discussion in Issue #434 and additional offline discussions between @danielballan, @jmaruland, and me (@padraic-shafer).

  • Add a new StructureFamily enum value, named "bytes".
  • Add the corresponding
    • Server-side adapter
    • Catalog adapter
    • Specialized client node
    • Function/method to add a "bytes" node to a container
  • Add a server route(s) to access the "bytes" of the underlying asset
    • See related discussion in https://github.com/bluesky/tiled/issues/473#issuecomment-1610163619 and https://github.com/bluesky/tiled/issues/90.

The following aspects will need more discussion

Registering a file/object in the catalog

  • If the MIME type of the file is not detected, or an Adapter cannot otherwise be selected, then Tiled should handle this gracefully and with useful information to the maintainer of the server.
  • Probably, a warning should be logged and the asset should be registered with the structure family "bytes".
  • If no MIME type is detected, then it should probably fall back to "application/octet-stream".
  • If a MIME type is detected but the data type is not readily coerced to a Tiled data structure family, then the structure family "bytes" should be used and the detected MIME type should be recorded.
  • Perhaps a "strict mode" flag could be used to ignore the asset if it matches one of these "fallback" conditions.

Slicing into the byte stream with a HTTP range request

  • The user may want to only download or access a small part of a large file.
  • If they know the exact byte offsets to access, then we could support this with a combination of the python Buffer Protocol and the HTTP header field "Content-Range".
  • See the related Issue #521.

Lazy loading of the "bytes" data

  • For performance, it might be useful for the python client to return a Dask object representing the underlying bytes of the asset.
  • See, for example, dask.bytes.[core.]read_bytes().

Contents of the metadata's structure field

  • Necessary information like the MIME type and content length can be found in the data_source field.
  • For now it's probably best to keep the structure field empty (null or None).
  • This can be revisited if field testing reveals additional info that would be useful.

padraic-shafer avatar Sep 05 '23 22:09 padraic-shafer

In terms of where the code needs to be updated, there are many analogs between this current PR and #549. For convenience here are the diffs.

padraic-shafer avatar Sep 06 '23 18:09 padraic-shafer