tower-cli Add support for Data Explorer functionality (list, add)

Add a new command:

tw data-links

that interacts with the new Data Explorer data-links API endpoint in the Seqera Platform. Some suggested functionality (need to add all the auth syntactic sugar):

tw data-links list --workspace=<workspaceId>                             # list data-links in a workspace
tw data-links list --workspace=<workspaceId> --provider=<cloudProvider>  # subset to a specific cloud provider
tw data-links list --workspace=<workspaceId> --type <cloud|custom>       # subset to auto cloud or custom data links
tw data-links add --workspace=<workspaceId> --name=<dataLinkName> --credentials=<credentials> --description=<description> --provider=<cloudProvider> # add a custom data link to a workspace
tw data-links delete --datalink=<datalinkId> --workspace=<workspaceId>   # delete a datalink in a workspace
tw data-links cp --datalink=<datalinkId>:/path/to/object.txt object.txt  # copy/download a single object (defined by prefix path) from the data link to your localhost 
tw data-links cp /path/to/samplesheet.csv --datalink=<datalinkId> --workspace=<workspaceId> # upload a file from your localhost to the data link
tw data-links cp /path/to/folder --datalink =<datalinkId> --workspace=<workspaceId> --recursive # upload all files in a folder from your localhost to the data link

### Tasks
- [ ] https://github.com/seqeralabs/tower-cli/issues/405
- [ ] https://github.com/seqeralabs/tower-cli/issues/406
- [ ] https://github.com/seqeralabs/tower-cli/issues/407
- [ ] https://github.com/seqeralabs/tower-cli/issues/413
- [ ] https://github.com/seqeralabs/platform/issues/6457

Jan 11 '24 16:01 robnewman

Presumably tw data-explorer cp downloads to the current working directory? Could be nice to support alternative destinations too.. 🤔 Potentially as a separate tw data-explorer sync command, or a flag, or just a second positional argument..

Jan 11 '24 16:01 ewels

For me it's a -1. Why bloating the CLI with this?

Jan 11 '24 17:01 pditommaso

You could argue that it's not worth having a CLI at all, there's a perfectly good API!

Having it in the CLI makes it faster and easier to work with datasets from the terminal. It improves developer / user experience.

Jan 11 '24 19:01 ewels

Also the technical reason for having it in the CLI for downloading files:

Presigned URLs expire after a short-ish window (@swampie thought it was 1 hour). If downloading a large dataset, the download could easily run for many hours. A generated bash script would therefore fail, however the CLI could request the presigned URLs one at a time in series, meaning that they're always fresh and continue to work.

Jan 11 '24 19:01 ewels

Adding a usecase to download/list a dataset, with a flag to download/list the files inside the dataset csv/tsv/table. For example:

tw dataset cp <dataset_id> --files

This downloads the dataset table (csv/tsv) plus the files. In this way the user only has to be concerned with passing around the dataset object, and they can download/list the files at any time. Think a dataset can also be an output so it becomes a packaging mechanism.

Note: Today, the auth to access/download/list files in a dataset is not guaranteed as users can create whatever s3:// paths they want in a csv. This issue also exists when launching a pipeline.

Jan 11 '24 20:01 evanfloden

Fair enough

Jan 15 '24 10:01 pditommaso

for upload and download why using the seqera cli when you can use the standard cloud tooling?

Jan 18 '24 11:01 swampie

No need to maintain cloud credentials locally
Support multiple compute env types (clouds) with a consistent command and single CLI tool
Download via consistent Seqera identifiers, less risk of sample or file mixup
user experience if we add nice things as suggested by Evan: eg. downloading all data paths within the CSV

Jan 18 '24 22:01 ewels

Considering the ongoing work to extend the Data Explorer availability to personal workspaces, these new CLI capability should be implemented for those as well.

Jan 24 '24 09:01 mbosio85

I agree with Paolo that the complexity is not justified for the time being: open to discuss

Jan 25 '24 11:01 swampie

Adding a very key point being lost here.

Our end users shouldn't need cloud console or cloud provider CLI access. They likely don't have cloud credentials. This is the point of having different roles with WS admins adding credentials and CEs.

End users want to upload data, run pipelines, and download results.

Jan 26 '24 12:01 evanfloden

I agree 💯 that CLI should have first-class support. However, my understanding is that the feature highlighted here does not come for free, it may require some specific endpoints.

Jan 26 '24 13:01 pditommaso

Updated original request to match the Data Explorer data-links API endpoint name

Apr 01 '24 22:04 robnewman

TBD - pagination is always returned by the API, need to account for this in the CLI commands.

Apr 17 '24 13:04 robnewman

I feel a bit weird about naming this subcommand data-links. I've checked the data explorer's UI, and there, you can upload files without any mention of the "data link" concept. And you can create new "data links" also without any mention of that concept. Why should we use this name in the CLI?

The sub-title where you can list your "data links" says, "Browse remote data repositories and data for use in Seqera Cloud," with no reference to this "data link" concept. Overall, this "data link" concept is misleading.

I'd call it "data source", and then the command line can be tw data-source... with tw ds... alias. Also, the tw data-source add ... subcommand would be more meaningful.

But because naming is difficult and what sounds good to me may sound terrible to others, I suggest reviewing this naming before hardcoding it into the command line interface. Or at least, if "data link" is chosen as the best way of naming it, the web UI should be consistent and call that section "data links" instead of "data explorer" with explicit references to the "data link" concept when you add a new one.

May 10 '24 14:05 jordeu

@jordeu Thanks for the feedback! The Data Explorer API endpoint is called data-links and we were being consistent with that. I think it would be more confusing to have the API endpoint named differently to the CLI interface (when both are publicly accessible). I agree that the term "data-link" is widely used internally but not directly surfaced externally. I would be in favor of explicitly referencing that term in our docs, but open to feedback.

May 14 '24 14:05 robnewman

@robnewman we are missing here the method to list content

May 23 '24 10:05 weronikasosnowskaseqera

@weronikasosnowskaseqera Please add. I wasn't necessarily comprehensive - just that the functionality needs to exist and reflect the API functionality.

May 23 '24 13:05 robnewman

This issue has been unlinked from a Canny post: Add datasets directly from s3 / data explorer to the platform :cry:

Jun 13 '24 15:06 canny[bot]

This is now done except for the tw data-link cp command. The other commands are part of the v0.9.4 release.

Aug 20 '24 19:08 robnewman

tw data-link cp (download/upload) will be handled with another task: https://seqera.atlassian.net/browse/PLAT-289

Aug 21 '24 06:08 weronikasosnowskaseqera

tower-cli tower-cli copied to clipboard

Add support for Data Explorer functionality (list, add)

tower-cli
tower-cli copied to clipboard