tiled
tiled copied to clipboard
Run client against catalog in-process
Consider the following use case.
User connects to a remote Tiled server.
from tiled.client import from_uri
client = from_uri("https://tiled.nsls2.bnl.gov")
...
Perhaps a client-side cache is engaged---maybe by default---and automatically stashes locally any data or metadata they browse, up to a certain size, just as a web browser cache does.
Then, they decide they want to download a swath of the data for local, offline use. Something like:
# This does not exist---just a proposal.
from tiled.client import download
download(client.search(...), "stuff/")
That could also be available as a CLI (tiled download ...
) or a button in a web app. However it happens, suppose that this creates a zip archive or directory with contents like:
stuff/
catalog.db
data/
...
where data/
contains files. These would be the same files backing Tiled on the server side, perhaps exactly the files the detector wrote. (Notice that if there were a client-side cache engaged, download(...)
would naturally use it, so the user would not be downloading anything twice.)
The user can then use this local archive in three different ways.
Option 1: Just the files, please
Open the files in data/
, and just ignore catalog.db
and Tiled.
Option 2: Local Tiled Server
Run a local tiled server against this data
tiled serve catalog --public stuff/
Then navigate to http://localhost:8000
in a web browser, or connect to it from any other program, or use the Tiled Python client:
from tiled.client import from_uri
client = from_uri("http://localhost:8000")
Option 3: In-process access
But if we want to access the local data from Python specifically it's not even necessary to start a server in a separate process with tiled serve catalog ...
. We can skip that step and do everything from one Python process.
# This does not exist---just a proposal.
from tiled.client import from_catalog
client = from_catalog("stuff/catalog.db", readable_storage=["stuff/data"])
The above uses ASGI to run a "server" and the client in the user's Python process, passing HTTP messages via Python function calls within one process instead of TCP packets between separate server and client processes.
Because this runs in a single process, it can easily be wrapped up in third-party convenience libraries which can be as "magical" or explicit as one wants
from nsls2_data_thingie import remote_access, download, local_access
client = remote_access()
download() # perhaps downloads to some default location, like ~/.cache/nsls2_data_thingie
client = local_access()
P.S. The above imagines a SQLite database catalog.db
and that is probably best for the vast majority of users. But we could easily support a PostgreSQL target for download
and from_catalog
. I am not sure how necessary it is, but it is easy to support.
This will require adding an endpoint like /assets/{id}
, which is something we wanted anyway.
This will require adding an endpoint like
/assets/{id}
, which is something we wanted anyway.
What should this endpoint return when the asset is a directory (or directory-like, such as an HDF5 virtual dataset)?
- An archived collection — .zip, .tar
- A list of the underlying asset endpoint URLs — probably just the next level down
- Other
Good question, this wrinkle had not occurred to me yet.
In #450, we use ZIP to bundle multiple buffers (numpy arrays) into one response. We considered TAR, but ZIP has two important points in its favor. ZIP supports random access---unlike TAR, it has an index---and ZIP is also understood by web browsers, which could be useful in the context of web apps.
I wonder if we'll decide to support both a ZIP bundle and individual asset URLs ("the next level down").
ZIP seems like a good starting point to move forward with. We could wait to add the option for asset URLs at a later time when/if they become necessary.