tiled Run client against catalog in-process

Consider the following use case.

User connects to a remote Tiled server.

from tiled.client import from_uri

client = from_uri("https://tiled.nsls2.bnl.gov")
...

Perhaps a client-side cache is engaged---maybe by default---and automatically stashes locally any data or metadata they browse, up to a certain size, just as a web browser cache does.

Then, they decide they want to download a swath of the data for local, offline use. Something like:

# This does not exist---just a proposal.
from tiled.client import download

download(client.search(...), "stuff/")

That could also be available as a CLI (tiled download ...) or a button in a web app. However it happens, suppose that this creates a zip archive or directory with contents like:

stuff/
    catalog.db
    data/
        ...

where data/ contains files. These would be the same files backing Tiled on the server side, perhaps exactly the files the detector wrote. (Notice that if there were a client-side cache engaged, download(...) would naturally use it, so the user would not be downloading anything twice.)

The user can then use this local archive in three different ways.

Option 1: Just the files, please

Open the files in data/, and just ignore catalog.db and Tiled.

Option 2: Local Tiled Server

Run a local tiled server against this data

tiled serve catalog --public stuff/

Then navigate to http://localhost:8000 in a web browser, or connect to it from any other program, or use the Tiled Python client:

from tiled.client import from_uri

client = from_uri("http://localhost:8000")

Option 3: In-process access

But if we want to access the local data from Python specifically it's not even necessary to start a server in a separate process with tiled serve catalog .... We can skip that step and do everything from one Python process.


# This does not exist---just a proposal.
from tiled.client import from_catalog

client = from_catalog("stuff/catalog.db", readable_storage=["stuff/data"])

The above uses ASGI to run a "server" and the client in the user's Python process, passing HTTP messages via Python function calls within one process instead of TCP packets between separate server and client processes.

Because this runs in a single process, it can easily be wrapped up in third-party convenience libraries which can be as "magical" or explicit as one wants

from nsls2_data_thingie import remote_access, download, local_access

client = remote_access()

download()  # perhaps downloads to some default location, like ~/.cache/nsls2_data_thingie
client = local_access()

Jun 24 '23 00:06 danielballan

P.S. The above imagines a SQLite database catalog.db and that is probably best for the vast majority of users. But we could easily support a PostgreSQL target for download and from_catalog. I am not sure how necessary it is, but it is easy to support.

Jun 24 '23 00:06 danielballan

This will require adding an endpoint like /assets/{id}, which is something we wanted anyway.

Jun 27 '23 20:06 danielballan

This will require adding an endpoint like /assets/{id}, which is something we wanted anyway.

What should this endpoint return when the asset is a directory (or directory-like, such as an HDF5 virtual dataset)?

An archived collection — .zip, .tar
A list of the underlying asset endpoint URLs — probably just the next level down
Other

Aug 30 '23 12:08 padraic-shafer

Good question, this wrinkle had not occurred to me yet.

In #450, we use ZIP to bundle multiple buffers (numpy arrays) into one response. We considered TAR, but ZIP has two important points in its favor. ZIP supports random access---unlike TAR, it has an index---and ZIP is also understood by web browsers, which could be useful in the context of web apps.

I wonder if we'll decide to support both a ZIP bundle and individual asset URLs ("the next level down").

Aug 30 '23 13:08 danielballan

ZIP seems like a good starting point to move forward with. We could wait to add the option for asset URLs at a later time when/if they become necessary.

Aug 30 '23 14:08 padraic-shafer

tiled tiled copied to clipboard

Run client against catalog in-process

Option 1: Just the files, please

Option 2: Local Tiled Server

Option 3: In-process access

tiled
tiled copied to clipboard