tower-cli
tower-cli copied to clipboard
Add support for Data Explorer functionality (list, add)
Add a new command:
tw data-links
that interacts with the new Data Explorer data-links
API endpoint in the Seqera Platform. Some suggested functionality (need to add all the auth syntactic sugar):
tw data-links list --workspace=<workspaceId> # list data-links in a workspace
tw data-links list --workspace=<workspaceId> --provider=<cloudProvider> # subset to a specific cloud provider
tw data-links list --workspace=<workspaceId> --type <cloud|custom> # subset to auto cloud or custom data links
tw data-links add --workspace=<workspaceId> --name=<dataLinkName> --credentials=<credentials> --description=<description> --provider=<cloudProvider> # add a custom data link to a workspace
tw data-links delete --datalink=<datalinkId> --workspace=<workspaceId> # delete a datalink in a workspace
tw data-links cp --datalink=<datalinkId>:/path/to/object.txt object.txt # copy/download a single object (defined by prefix path) from the data link to your localhost
tw data-links cp /path/to/samplesheet.csv --datalink=<datalinkId> --workspace=<workspaceId> # upload a file from your localhost to the data link
tw data-links cp /path/to/folder --datalink =<datalinkId> --workspace=<workspaceId> --recursive # upload all files in a folder from your localhost to the data link
### Tasks
- [ ] https://github.com/seqeralabs/tower-cli/issues/405
- [ ] https://github.com/seqeralabs/tower-cli/issues/406
- [ ] https://github.com/seqeralabs/tower-cli/issues/407
- [ ] https://github.com/seqeralabs/tower-cli/issues/413
- [ ] https://github.com/seqeralabs/platform/issues/6457
Presumably tw data-explorer cp
downloads to the current working directory? Could be nice to support alternative destinations too.. 🤔 Potentially as a separate tw data-explorer sync
command, or a flag, or just a second positional argument..
For me it's a -1. Why bloating the CLI with this?
You could argue that it's not worth having a CLI at all, there's a perfectly good API!
Having it in the CLI makes it faster and easier to work with datasets from the terminal. It improves developer / user experience.
Also the technical reason for having it in the CLI for downloading files:
Presigned URLs expire after a short-ish window (@swampie thought it was 1 hour). If downloading a large dataset, the download could easily run for many hours. A generated bash script would therefore fail, however the CLI could request the presigned URLs one at a time in series, meaning that they're always fresh and continue to work.
Adding a usecase to download/list a dataset, with a flag to download/list the files inside the dataset csv/tsv/table. For example:
tw dataset cp <dataset_id> --files
This downloads the dataset table (csv/tsv) plus the files. In this way the user only has to be concerned with passing around the dataset object, and they can download/list the files at any time. Think a dataset can also be an output so it becomes a packaging mechanism.
Note: Today, the auth to access/download/list files in a dataset is not guaranteed as users can create whatever s3:// paths they want in a csv. This issue also exists when launching a pipeline.
Fair enough
for upload and download why using the seqera cli when you can use the standard cloud tooling?
- No need to maintain cloud credentials locally
- Support multiple compute env types (clouds) with a consistent command and single CLI tool
- Download via consistent Seqera identifiers, less risk of sample or file mixup
- user experience if we add nice things as suggested by Evan: eg. downloading all data paths within the CSV
Considering the ongoing work to extend the Data Explorer availability to personal workspaces, these new CLI capability should be implemented for those as well.
I agree with Paolo that the complexity is not justified for the time being: open to discuss
Adding a very key point being lost here.
Our end users shouldn't need cloud console or cloud provider CLI access. They likely don't have cloud credentials. This is the point of having different roles with WS admins adding credentials and CEs.
End users want to upload data, run pipelines, and download results.
I agree 💯 that CLI should have first-class support. However, my understanding is that the feature highlighted here does not come for free, it may require some specific endpoints.
Updated original request to match the Data Explorer data-links
API endpoint name
TBD - pagination is always returned by the API, need to account for this in the CLI commands.
I feel a bit weird about naming this subcommand data-links
. I've checked the data explorer's UI, and there, you can upload files without any mention of the "data link" concept. And you can create new "data links" also without any mention of that concept. Why should we use this name in the CLI?
The sub-title where you can list your "data links" says, "Browse remote data repositories and data for use in Seqera Cloud," with no reference to this "data link" concept. Overall, this "data link" concept is misleading.
I'd call it "data source", and then the command line can be tw data-source...
with tw ds...
alias. Also, the tw data-source add ...
subcommand would be more meaningful.
But because naming is difficult and what sounds good to me may sound terrible to others, I suggest reviewing this naming before hardcoding it into the command line interface. Or at least, if "data link" is chosen as the best way of naming it, the web UI should be consistent and call that section "data links" instead of "data explorer" with explicit references to the "data link" concept when you add a new one.
@jordeu Thanks for the feedback! The Data Explorer API endpoint is called data-links
and we were being consistent with that. I think it would be more confusing to have the API endpoint named differently to the CLI interface (when both are publicly accessible). I agree that the term "data-link" is widely used internally but not directly surfaced externally. I would be in favor of explicitly referencing that term in our docs, but open to feedback.
@robnewman we are missing here the method to list content
@weronikasosnowskaseqera Please add. I wasn't necessarily comprehensive - just that the functionality needs to exist and reflect the API functionality.
This issue has been unlinked from a Canny post: Add datasets directly from s3 / data explorer to the platform :cry:
This is now done except for the tw data-link cp
command. The other commands are part of the v0.9.4 release.
tw data-link cp
(download/upload) will be handled with another task: https://seqera.atlassian.net/browse/PLAT-289