cue icon indicating copy to clipboard operation
cue copied to clipboard

proposal: cmd/cue: rename 'cue cmd' to 'cue run' or 'cuerun'

Open myitcv opened this issue 4 years ago • 20 comments

In https://github.com/cue-lang/cue/issues/337 and various other places we have mooted the idea of moving the logic from cue cmd to a different command, with cue run or another entry point entirely cuerun (originally an idea from @shykes)

Creating this issue as a WIP placeholder for a full proposal for such a move, with rough notes as follows:

  • ~~Currently cue cmd does not have a means by which the user can specify the package (path) from which the command is derived: it is currently ..~~ [Edit: See this comment] We might want to support this so that running a command from an alternative location within the main module is supported (for example the command could even be defined in a dependency of the main module). This is especially true in the world of modules, where one might want to do that equivalent of go run $pkg. Indeed we could go one step further and consider something equivalent to go run $pkg@$version and then the dependency does not even need to exist as a dependency in the main module
  • Per discussions with @mpvl, we also need to better define what it means for such a command to take package pattern arguments, and how the command operates on such arguments
  • Potentially rethink the use of _tool.cue to designate the command specification. We kicked around the idea of *_tool package space, much like Go's *_test external tests
  • Given the previous point, clear designation on how the package defining the commands interacts with any other packages (at the moment there is a rather complicated "merge")
  • Whether to move away from the model whereby $id-style fields are used to identify tasks, for example towards a model where a hidden field from within a tool/* package is used to identify a task.
  • As noted by @eonpatapon, we should ideally be able to handle injection "arguments" to a command after the command name itself (as well as before). Consider this as part of argument handling generally
  • Per @mpvl, one advantage to keeping cue run as opposed to cuerun is that flags can precede the run in cue run. Whether we require this or not remains to be seen, just noting FYI
  • Better defined support for composing commands/tasks. i.e. importing a command/task from a package to allow for reuse. This might extend to include implicit command/task referencing. See bug in https://github.com/cue-lang/cue/issues/1834.
  • Ensure that we support command declaration even in the presence of embedded disjunctions. This does not work with cue cmd
  • A means by which commands/tasks can control levels of parallelism. This could be achieved by instances of mutexes/semaphores. A command/task having a reference to a mutex/semaphore would then control parallelism. https://github.com/cue-lang/cue/issues/709
  • Fixing the case of dependency calculation in the presence of (implicit) default values. See https://github.com/cue-lang/cue/issues/1568
  • Better define when tasks are "ready" to run. See https://github.com/cue-lang/cue/issues/1593
  • Revisit and extend the notion of a task being considered done/complete (which we currently achieve via $after), so that other tasks can depend on the "exit code" for a task. Control flow could be introduced to handle failure cases, with a default behaviour of failing the entire command if nothing depends on the failure case. See https://github.com/cue-lang/cue/issues/1568.
  • Revisit the idea of how a reference creates a dependency on other tasks. Including https://github.com/cue-lang/cue/issues/2078
  • Related to a number of previous points: better model what the inputs to a command are.
    • The package which defines the command
    • An optional specification of a set of packages/directories where that task should be run (default to $CWD of command)
    • Some indication of how that list of packages/directories should be handled: iterate, ...
    • Other command line arguments []string that could be consumed by a task
  • tool/file.Glob currently passes the argument through to https://pkg.go.dev/path/filepath#Glob, which unfortunately allows ** as an equivalent to *, which can be confusing to users and makes it hard to later decide to start supporting "extended globbing" to recurse into subdirectories.
  • The embed proposal introduces the @embed attribute. The arguments to @embed are designed to follow closely the behaviour of cmd/cue with respect to its inputs. This consistency is intentional. However there is now a large gap in capability and convenience when it comes to loading files via workflows. We should therefore introduce a tool-based function (in an existing/new package) that provides the same functionality as @embed.
  • In https://github.com/cue-lang/cue/issues/3357 @Kryan90 raises that it should be possible to be more explicit about the parameters for a workflow, more specifically which are required. Such that better error messages can be reported when it comes to running a workflow (as opposed to more generic CUE error messages "non-concrete X").
  • Currently it is possible for a task to be declared/specified by including a struct value with an appropriately set $id field. Whilst doing so works, it mean that default in the task's declaration are not applied, and accordingly the Go implementation has to maintain a "duplicate" of this default logic. e.g. this default is "duplicated" in this Go code. We should not allow this kind of back door task injection, and instead design an approach which requires explicit unification with the task declaration.
  • Issues like https://github.com/cue-lang/cue/issues/3897 are an example a class of problems where tasks are either not declared because the field names are incomplete (in which case they don't get "detected") or when tasks don't get run but were expected to, again because of a lack of data. The UX issues of https://github.com/cue-lang/cue/issues/3897 were solved by erroring in the case that no tasks were found, but this does not solve the general case of "we didn't find and run all the tasks we were expecting to". A v2 of cue cmd should consider this problem fairly carefully, especially the UX of surfacing issues, debugging etc.
  • Related to the previous point is the case where the command struct value referenced via the argument to cue cmd is itself in error. For example if a field name in that struct is incomplete:
import "tool/cli"

input: string
command: willFail: {
	"\(input)": cli.Print & {text: "ok?"}
	"ok": cli.Print & {text: "ok"}
}

This currently fails, but it is unclear whether it should. Not because we could make input complete later, but rather it's arguably more consistent to not fail in this situation because "nested" fields where this is the case do not trigger an error (hence the advice to "push down" into fields if this situation occurs, which it can with things like comprehensions).

  • ...

myitcv avatar Oct 25 '21 14:10 myitcv

Should cue cmd allow importing and unifying non-CUE files like other commands that adhere to the interface described in cue flags?

seh avatar Oct 25 '21 14:10 seh

@seh such non-CUE files cannot, by definition, apply to the command specification, so presumably you are referring to data that is consumed as part of the command execution? If so, yes, that is in scope. At the moment, the "merge" of command and package is rather confusing to both implement and understand. I've added a note above about (forgot to do so earlier) for clearer designation of the command specification vs the package on which the command operates. But if you have specific examples to help motivate the design that would be very useful, thanks.

myitcv avatar Oct 25 '21 14:10 myitcv

But if you have specific examples to help motivate the design that would be very useful, thanks.

I had tried to use cue cmd to write a filter, where I wanted to supply several JSON file paths, have CUE import them and unify them against a definition (using the --path and --with-context command-line flags), and then run a few steps defined in CUE against that definition's value. I couldn't accomplish this easily using just cue export, as I wanted to tie in the output of calling on some other tools.

Perhaps I could have written the "import JSON and unify it with a definition" part in CUE myself, for lack of being able to use the --path and --with-context flags, but I was disappointed enough that I wound up using a different tool for the task instead.

seh avatar Oct 25 '21 14:10 seh

Should cue cmd allow importing and unifying non-CUE files like other commands that adhere to the interface described in cue flags?

Note that this will not be a simple rename, but rather, we would use this opportunity to clean things up. One issue currently is that the files that define the command and the files on which the command is run are the same. This is not always desirable. I could envision having these separated, allowing the aforementioned flags for some of them.

It seems that your specific use case can be quite useful to work out the specifics for this, so it may be useful to get some more detail here.

mpvl avatar Oct 26 '21 11:10 mpvl

Not sure if this belongs in this same issue, but I would love to be able to visualize a cue workflow. Even a transformation from the (tool/flow) DAG into graphviz dot or something would be handy.

There is this example that dumps the current state into a mermaid graph: https://github.com/cue-lang/cue/blob/fa141c2832c8b65bed5b76b9efde95df4d4274bc/tools/flow/flow_test.go#L230-L240

But that's not easily reusable and you'd probably have to run it after the flow is done since tasks can be dynamically generated based on the output of another.

StevenACoffman avatar Apr 08 '22 21:04 StevenACoffman

I had some further thoughts on workflows/pipelines. Since with Makefiles make supports -Bnd, there's a program make2graph that will parse that and render it as a graphviz dot file format such that even dynamically generated targets are accounted for. We could even use that approach to translate that into a skeleton Cue flow from a gnarly existing makefile.

make supports these options:

  • -B, --always-make Unconditionally make all targets.
  • -n, --just-print, --dry-run, --recon Print the commands that would be executed, but do not execute them.
  • -d Print debugging information in addition to normal processing. The debugging information says which files are being considered for remaking, which file-times are being compared and with what results, which files actually need to be remade, which implicit rules are considered and which are applied---everything interesting about how make decides what to do.

This is how I use make2graph on my local (mac) machine:

brew install openssl graphviz
export LDFLAGS="-L/usr/local/opt/openssl@3/lib"
export CPPFLAGS="-I/usr/local/opt/openssl@3/include"
git clone [email protected]:lindenb/makefile2graph.git
cd make2graph
make
mv make2graph /usr/local/bin/make2graph
cd ~/WHEREVERYOUMAKEFILEIS/
make -Bnd YOURTARGET |make2graph | dot -Tsvg -o deps.svg

StevenACoffman avatar Apr 09 '22 14:04 StevenACoffman

My thoughts based on what I want to see:

I would like command definitions to be simply data. Unless cue run reads in the command definitions, the data itself (and tool/* defs) should not hold any semantic meaning. Using conventions like _tool.cue or separate packages for commands inserts implicit context that ties CUE code to the file structure with special behavior.

Looking forward, we can imagine that a single CUE package may hold definitions that multiple separate tools would want to read, not only cue run. In that scenario I think it is especially important cue run behaves in a consistent way that 3rd party tools would: CUE definitions are just data, it is the programs that gives semantic meaning in their context.

There are nuances in how that would work but the general goal would be for cue run commands to simply be definitions that are treated the same as other definitions. They can be defined with other CUE code and imported from packages.

Semi-related note. _tool.cue is the mechanism to signal that a struct is a CUE command. Instead, should we consider defining that the data is a command in the definition? Here is an example based on Kubernetes:

{
apiVersion: "resources.cuelang.org/v1"
kind: "Command"
metadata: name: "foo"
spec:
 # Command schema here
 ...
}

We can see from above that we can avoid name collisions with a domain namespace which opens up a decentralized way for developers to define resources. This allows definitions to coexist based on a standard. A resource model is a discussion in itself but I thought this would be a good way to demonstrate a way for cue run configurations to live with other CUE code without special behavior.

cedricgc avatar Oct 19 '22 05:10 cedricgc

Related to a redesign of cue cmd, I would argue that we should no longer make cue foo an alias for cue cmd foo. For example, I momentarily forgot that we don't have an analogous command to Go's go list, so I was stumped by these errors:

$ cue list -h
cannot find package "-h"

The top-level cue command also has to do quite a bit of heavy lifting to figure out whether the first argument might be a custom command or not. Sometimes this causes surprising slow-downs, like https://github.com/cue-lang/cue/issues/2161. And in general, it makes cue's top-level command quite complex in its implementation, as it has quite a bit of intricate logic to treat "unknown command" as a fall-back to loading custom commands.

I've also been bitten by this when typoing standard top-level commands. For example:

$ cue exprot
[spends seconds loading the package]
command "exprot" is not defined
Ensure commands are defined in a "_tool.cue" file.
Run 'cue help' to show available commands.

I think this command should fail immediately with a better error message, and not even try to run a custom command. Otherwise, beyond the confusing error messages and time spent to load the current CUE package, whenever I run a top-level command there's the slight worry that maybe I'm running a custom command without realizing it.

I would personally not support shortcuts or aliases for running commands or tasks defined in CUE. Typing cue cmd foo, cue run foo, or cuerun foo is clear and explicit, and it's not that many keystrokes.

mvdan avatar Mar 26 '23 20:03 mvdan

I just raised https://github.com/cue-lang/cue/issues/2324 as an aide memoire so that I don't forget about it. It's related to this issue, but only insofar as cuerun would almost certainly be built on top of tools/flow or tools/flow/v2.

myitcv avatar Apr 03 '23 16:04 myitcv

Adding details of a report from Slack of a clear confusion that arises when calling a command with a pattern argument.

What did you do?

# Export with itemsList_tool.cue in place
exec cue export ./...
cmp stdout stdout.export-pre.golden

# Verify that cue cmd ls ./... works
exec cue cmd ls ./...
cmp stdout stdout.ls-pre.golden

# Note that a single ls command is run, that combines
# the results of all the packages that result from the
# expansion of ./...

# Now rename itemsList_tool.cue -> itemsList.cue
mv itemsList_tool.cue itemsList.cue

# Export with itemsList.cue in place
exec cue export ./...
cmp stdout stdout.export-post.golden

# Attempt to ls in this state
exec cue cmd ls ./...

-- cue.mod/module.cue --
module: "mod.com"
-- x.cue --
package x

items: {}
-- ls_tool.cue --
package x

import (
	"tool/cli"
	"strings"
)

command: ls: cli.Print & {
	text: "ls: " + strings.Join(itemsList, " ") + "\n"
}
-- itemsList_tool.cue --
package x

itemsList: [ for _, v in items {v}]
-- a/x.cue --
package x

items: {
	a: "a"
}
-- b/x.cue --
package x

items: {
	b: "b"
}
-- stdout.export-pre.golden --
{
    "items": {}
}
{
    "items": {
        "a": "a"
    }
}
{
    "items": {
        "b": "b"
    }
}
-- stdout.ls-pre.golden --
ls: a b

-- stdout.export-post.golden --
{
    "itemsList": [],
    "items": {}
}
{
    "itemsList": [
        "a"
    ],
    "items": {
        "a": "a"
    }
}
{
    "itemsList": [
        "b"
    ],
    "items": {
        "b": "b"
    }
}

What did you expect to see?

There are any number of potential expectations (hence the plan to address this with cuerun):

  1. cue cmd ls runs once per instance, for each instance that results from the ./... expansion, with a working directory of the directory of the instance.
  2. cue cmd ls runs once per instance, for each instance that results from the ./... expansion, with an unchanged working directory.
  3. cue cmd ls runs a single command with all packages that result from the ./... expansion unified together.
  4. cue cmd ls runs a single command with all packages that belong to the same package that result from the ./... expansion unified together.
  5. ...

Part of the challenge is that it's not clear which cue cmd actually does, nor is it possible/clear to select one of the other interpretations.

Noting https://github.com/cue-lang/cue/issues/1138 in the context of the expansion of ./....

What did you see instead?

# Export with itemsList_tool.cue in place (0.069s)
# Verify that cue cmd ls ./... works (0.077s)
# Note that a single ls command is run, that combines
# the results of all the packages that result from the
# expansion of ./...
# Now rename itemsList_tool.cue -> itemsList.cue (0.000s)
# Export with itemsList.cue in place (0.070s)
# Attempt to ls in this state (0.077s)
> exec cue cmd ls ./...
[stderr]
itemsList: incompatible list lengths (0 and 1)
[exit status 1]
FAIL: /tmp/testscript1691469317/repro.txtar/script.txtar:21: unexpected command failure

Per the expectation above, hard to know whether this is "right" or "wrong".

myitcv avatar Aug 09 '23 15:08 myitcv

Not sure if this is helpful but (and this might be a bit of abuse) but I run a command cue snap ./... or cue cmd snap ./... which rolls up all the files in a repo into a single mega package/object/whatever and dumps junk out into various yaml files.

So my expectation would be if I run with ./... that not only will my _tool.cue file be somewhere in ./... but it will run one command with everything in ./... rolled into one.

Now all files in my repo are part of one mega package (again probably an abuse) but my expectation would be to run against the one package it finds (if it finds only one), OR fail and expect you to provide a package that you would like the package to run against.

I think running a command implicitly multiple times, once per package would not be intuitive.

I also think setting the working directory to the package root would only make sense if you were to run multiple instances of the command once per package.

Generally speaking when you run commands on the command line the perspective of what takes place is the cwd of where the command was run, not what files the command is executing against.

You can always add extra explicit arguments on top of the default to add things like parallel per package instances of running a command, or you know, write a bash command to do the same thing (unix style)

Those are my opinions, and expectations if that is helpful.

from-nibly avatar Aug 29 '23 16:08 from-nibly

This is helpful, there is a broader conversation about the UX/DX of the cue/flow | cue run

You can actually build your own cue/flow implementations and we have a hof flow that moves the flow/task defintions to regular CUE files. We ignore _tool.cue files. Those could go away in the default cue run implementation. I would support that anyhow, based on our experience. We also added bulk processing to our flow, so you can run a workflow on a series of inputs, independently, and added parallel control and building blocks. Some of this is experimentation for the sake of it, but also some "we want certain task types that will never make it into the default implementation" We anticipate some of these will have a version in the default implementation, like running a server and using flow for the handlers.

I'd be very curious to know if hof/flow addresses some of your problems. https://github.com/hofstadter-io/hof

verdverm avatar Aug 29 '23 20:08 verdverm

So I don't have any problems with cue cmd snap ./... currently it behaves how I need it to. I can definitely understand why a revamp is needed though. There are some things that work, but feel implicit and vague. I've got it working how I want it to but that doesn't mean it would for everyone. I just wanted to lay out what I would expect from any kind of system like cue run/cmd.

from-nibly avatar Aug 29 '23 22:08 from-nibly