delta-sharing icon indicating copy to clipboard operation
delta-sharing copied to clipboard

Delta sharing should have a proper URI

Open dmoore247 opened this issue 3 years ago • 1 comments

The current delta sharing share file is an awkward and unfamiliar way to precisely address a remote data table. The share file creates friction as we want to scale up the universality of Delta Sharing. Look at http:, sftp: and s3: as examples.

Let us develop a standardized URI for Delta Sharing The standardized URI can then be dropped into common communication channels (email, messaging, web pages) and indexed by search engines and knowledge graphs.

Current problems:

  1. The share file caries a secret in clear text which is a common security issue and the file should be encrypted upon saving.
  2. The application programmer has to first store and distribute the file, the current APIs don't make it easy to store the share file in a secret store. The typical example starts with client = delta_sharing.SharingClient("file:///path/to/my/share/file")
  3. After discovering the table name, the full URI looks like: file:///path/to/my/share/file#<share-name>.<schema-name>.<table-name>. Of course, for each user, the URI isn't really universal because the file is local.
  4. The unique shared table identifier can't be easily transmitted in a standard form to collaborators

Instead, offer up a standardized URI interface to address delta sharing resources, something like: delta://token:<bearertokenvalue>@<endpoint host>:port/path/sharename/schema/table[/part_key=part_value] delta://<username>:<tokenvalue>@<endpoint host>:port/path/sharename/schema/table delta://<endpoint host>:port/<pathinfo>/<sharename>/<schema>/<table>

For public data sets then one can drop URIs into web pages: <a href="delta://<endpoint host>:port/path/sharename/schema/table>covid data 2020</a>

The vision is to integrate delta sharing protocol into a broad range of existing clients leveraging current URI handling subroutines. Enable sharing terabytes of data as easy as copy paste into slack.

dmoore247 avatar Oct 28 '22 15:10 dmoore247

Thanks @dmoore247, The idea is interesting.

A couple questions not sure if you already thought about it:

  1. Do we need to host a server for the common URI?
  2. I assume yes? Does the server handle the token authentication?
  3. And does the server identify where the provider table is located?

linzhou-db avatar Feb 15 '23 01:02 linzhou-db