feat: Support multiple providers for Remote Taskfiles and support directories
This PR aims to support multiple providers for Remote Taskfiles using go-getter, as I proposed in this comment in [#1317]. It is currently marked as Draft as I aim to get some early feedback, and iterate on it if we want to go forward with this approach or something similar.
This would support the following shapes of URLs as provided by go-getter (non-exhaustive):
| URL | Description |
|---|---|
https://example.com/Taskfile.yaml |
File over HTTP |
https://example.com/some-archive.zip |
Directory from ZIP archive over HTTP |
github.com/go-task/task//testdata/includes/Taskfile.yml?ref=some_branch |
Directory from Github ref |
[email protected]:repo/path |
Directory from any Git repository (via local git executiy using SSH) |
my-bucket.s3.amazonaws.com/path/to/Taskfile.yml |
File from S3 |
my-bucket.s3.amazonaws.com/path/to/folder/ |
Directory from S3 |
my-bucket.s3.amazonaws.com/path/to/archive.zip |
ZIP file from S3 |
my-bucket.s3.amazonaws.com/path/to/archive.zip?taskfile=AnotherTaskfile.yml |
ZIP file from S3 with non-standard entrypointTaskfile |
The aims of the change are as follows:
- To support multiple source providers for remote taskfiles
- To provide a well-defined syntax for source URLs that can support future providers if desired
- To support downloading directories containing remote taskfiles and supporting files, not only single taskfiles
- As an extension, we support downloading directories in the form of compressed archives (eg. zip, tgz, etc.) thanks to
go-getter
These are not directly tied to go-getter but they are satisfied by the library fairly well.
Some points of note with the current implementation:
-
There doesn't seem to be a need for multiple remote nodes - as long as one satisfies the
GetterandDetectorinterfaces, one can plug a new provider intoRemoteNode. -
The contract between
Nodeand the rest of the code is that theNodeplaces its content in a directory and providesFileDirectoryandFilename, which are used for accessing the Taskfile and Taskfile directory and to set variables like.TASKFILE_DIR. -
FileNodecould eventually be replaced by logic inRemoteNodeas well, if we'd like.go-gettersupports local files by using symlinks and we could leverage this. I'm just not sure how well this would on Windows - I would be glad to test it. -
The interaction between
Node,ReaderandCachefeels a bit confusing right now in terms of their lifecycle. It feels like we read the node contents a bit too late in the game, and that we could read the contents and deal with caching before we return a Node to the Reader. I didn't try to change this yet - I wanted to get some eyes on the overall idea, then make a plan to refactor it - either in this PR or a future one.ie: instead of calling
NewNode()in the reader and then later callnode.Read(), we would call something liker.loadNode()which already returns a fully loaded Node either from the cache or from remote. In other words, node initialization would include "reading". This would allow us to return a node with an immutableFileDirectory. Currently we have to replace the node if we end up loading its content from the cache because the directory changes.
This is also a big PR and I'd be happy to break it up into more manageable pieces if there's consensus. For example, some test and node/cache changes could be done first before adding go-getter.
What do you all think about the above?
I realize this PR is pretty big and I am making many decisions here. If you'd like to talk about this in real-time perhaps on Discord, let me know!
This was meant to be marked as Draft from the start - I only noticed it now.
+1 from me as I desperately need GitLab support. Currently there’s no way to provide a GITLAB_TOKEN for repos that require authentication when using remote taskfiles.
I don't love the way this PR turned out. I see two areas that are awkward right now:
How the FileNode and RemoteNode integrate with each other
Each node needs to be able to handle relative include paths in its own includes via ResolveEntrypoint. This requires that the FileNode be aware of remote nodes, and vice-verse. I have to investigate this a bit.
One idea is to have a single node type that handles file and remote. Another is to refactor this more and separate the path resolution aspect from the download aspect - although I'm not sure how much they can be separate since they depend on each other.
The sequence between node loading and caching
Currently I am storing remote node data in a temp directory until the user accepts it, then we move it to the cache. This means the node's "source" data points to the temp directory and we must reload it if we end up using the cached version (here). I'm trying to think if there's simpler logic there.
Caching behaviour in general
With the ability to download directories, this opens the door to larger downloads. The behaviour of automatically downloading every time feels a bit heavy. I wonder if we could have an option to change the default so that once a remote include is cached, we will always use the cache. For people using versioned refs (eg. v1.2.3) this shouldn't be an issue since the ref content is expected to be immutable.
In this case two options I see are:
a) Decide to change the default to always use the cache if it's there and rely on --download to force a refresh
b) Add an env var (or CLI arg or both) that allows us to invert the behaviour
Thanks for your work on this! I would definitely be keen for an option to use the cache by default. To me it ties in nicely to this other feature request I created https://github.com/go-task/task/issues/1402
This approach seems a bit heavy-handed and it's trying hard to contort go-getter into the shapes of go-task. I will propose a lighter approach.