Idea for better (non-redundant, more compact) branching syntax for selectors?
While thinking about #1154, it occurred to me that it would be even nicer if we could do a match-style syntax that doesn't have to choose a single branch but could apply several!
files:
- match: linux
- something_linux
match: osx
- something_osx
match: unix # e.g. on linux, this should be taken
- something_unix # _in addition_ to the linux-only case
match: win
- something_win
match: unix or win
- something_that_applies_to_everything
# no more `else:`, only explicit matches!
I guess it might be a bit late for such deep syntactic changes, but IMO this would make it much nicer to write recipes, rather than having to constantly restart if: branches to avoid exhaustivity-problems.
It would allow writing using a single match: (or whatever you want to call it, like the old sel: before it turned into if:) block for every section of the recipe, which would be much nicer IMO if we're looking at translating things like
examples
- cython # [build_platform != target_platform]
- vtk =*=osmesa_* # [build_platform != target_platform and build_variant == "noqt"]
- vtk =*=qt_* # [build_platform != target_platform and build_variant != "noqt"]
- qt-main # [build_platform != target_platform and build_variant == "pyqt"]
- qt6-main # [build_platform != target_platform and build_variant == "pyside6"]
or this
- crossenv # [build_platform != target_platform]
- maturin >=1.3.2,<2 # [build_platform != target_platform]
- {{ stdlib("c") }}
- {{ compiler('c') }}
- {{ compiler('cxx') }} # [win]
- {{ compiler('rust') }}
- posix # [build_platform == "win-64"]
- cmake
- make # [unix]
- cargo-feature # [build_platform != target_platform and target_platform == "win-64"]
etc.
The more dimensions (unix vs win, native vs. cross, python version, CUDA or not, implementation variants for openmpi/BLAS/QT/etc.) we need to cover in a given recipe, the harder it gets to do that with simple if: else: branches. Consider something like
build:
- match: build_platform != target_platform
# cross-compilation
match: build_platform != target_platform and cuda_version
# cross-compiled CUDA
match: match(cuda, "<12")
# CUDA 11.x
match: match(cuda, ">=12.0")
# CUDA 12.x
match: linux
# linux
match: osx
# osx
match: unix
# unix
match: win
# win
match: blas_impl == "mkl"
# mkl
It would be a much lower mental overhead to be able to freely shuffle around lines based on which sections they apply to (e.g. not having to restructure the following into correctly (non-)overlapping if: statements), only adding lines to the selector where they are supposed to be applied.
It's completely impossible because duplicate keys are forbidden :(
It's completely impossible because duplicate keys are forbidden :(
Well, how about making each - match: a separate list entry? That's also how multiple if:'s work IIUC.
In other words, the following (note the extra - everywhere)
build:
- match: build_platform != target_platform
# cross-compilation
- match: build_platform != target_platform and cuda_version
# cross-compiled CUDA
- match: match(cuda, "<12")
# CUDA 11.x
- match: match(cuda, ">=12.0")
# CUDA 12.x
- match: linux
# linux
- match: osx
# osx
- match: unix
# unix
- match: win
# win
- match: blas_impl == "mkl"
# mkl
... should be desugared to the same list as the impossible variant that has multiple match:. The only thing the desugaring would need to learn is to ignore empty lists (for a match: that doesn't apply), though I guess that's already covered by the existing infra (e.g. for an if: without else:)
Why not use a top level match with nested cases?
Why not use a top level
matchwith nestedcases?
I'd be happy with that too!
But how is it different from multiple if statements?
You dont need to repeat the negation of the if clauses.
You can more easily write something specific for win, macos, unix without having to repeat not win, etc.
Sorry, im on a phone so its a bit hard to type a proper example.
For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):
build:
script: |
echo "Is ${{ target_platform == "linux-64" }}"
{% if target_platform == "linux-64" %}
echo "I AM A LINUX"
{% elif osx %}
echo "I AM A MAC"
{% endif %}
And of course, you could just as well use bash.
So I am not sure if it's worth it to make our syntax more complicated or not ...
Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)
One of the design goals was to make the recipe easier to parse for machines. I think what you are suggesting is taking a step back. Having a match-kinda statement seems like a great (and logical) building block.
Yeah but we've never semantically understood the contents of the scripts themselves so I'm not sure if we need that. This doesn't affect the "everything is pure YAML" property.
Sure but just as you mentioned in another issue you just closed, we can validate the control flow statements using the schema. That would become harder and harder the more jinja we start using.
If we build our own LSP though .. :) I think I do get your point, but I don't see it super different from e.g. people having issues with their bash scripts inside the script tag.
For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):
build: script: | echo "Is ${{ target_platform == "linux-64" }}" {% if target_platform == "linux-64" %} echo "I AM A LINUX" {% elif osx %} echo "I AM A MAC" {% endif %} And of course, you could just as well use bash.
So I am not sure if it's worth it to make our syntax more complicated or not ...
Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)
This does not seem to work:
{% for bin in qemu_bins %}
test -f ${QEMU_LD_PREFIX}/{{ bin }}
{% endfor %}
│ │ │ + test -f $PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/ld-2.17.so
│ │ │ + test -f '$PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/{{' bin '}}'
│ │ │ $PREFIX/etc/conda/test-files/qemu-user-execve-aarch64/0/conda_build.sh: line 14: test: too many arguments
The issue with passing to a bash -c ... is that ${{ xxx }} expands as ["...", "...",] which trips-up bash as it is expecting arrays to be more like (xxx, xxx, ), so this also fails:
bash -c 'l=${{ qemu_bins | split(" ") | list }}; for bin in "\${l[@]}"; do test -f ${QEMU_LD_PREFIX}/${bin}; done'
Where did you set qemu_bins? The context has a behvior where it casts everything to a string.
But if you do:
{% set qemu_bins = ["foo", "bar"] %}
{% for bin in qemu_bins %}
test -f ${QEMU_LD_PREFIX}/{{ bin }}
{% endfor %}
It has a chance of working :)
Why not use a top level
matchwith nestedcases?
Actually, one reason why I do have a preference for bare match:, is that match: - case ... -case suggests disjoint conditions again (at least based on how pattern matching works in most languages), whereas I'd like to not match one specific thing to various disjoint cases, but rather freely match conditions for different quantities that may overlap. Using a different syntax there would avoid triggering some unhelpful muscle memory.
@h-vetinari sorry, not following. What's the difference of match then to a number of if clauses if the match' clauses are disjoint? I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.
match:
- case: unix
then:
- bla
- case: win and arm64
then:
- foobar
- case: win
then:
- foobar-win64 # NOT happening on arm64
Also your initial example is not at all valid YAML if I see that correctly.
Attached two screenshots highlighting the issues (from https://yamlchecker.com/)
I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.
I explained why that's not desirable in the OP (or at least that there are cases where branches shouldn't be exclusive).
Also your initial example is not at all valid YAML if I see that correctly.
Yes, I hate YAML. 🙃
The way I understand it is that you want "if"-syntax without then, correct?
You want falsey branches to be skipped, but every truthy branch to be taken, right?
In e.g. Rust, when usign match, only the first truthy branch is taken. In C/C++ you would have to add a break for this behavior. In Python also only the first truthy value is matched.
But if I understand your proposal correctly, you want every truthy branch to "match"?
Can you add corrected YAML to your proposal? Because I think if you write out proper YAML, it will look just like the current if statements.
It has a chance of working :)
Right! Silly me, I forgot I needed to split (I did know that, just got caught in the Jinja mixin). The variable is in the context section, it is a ${{ [x, y, z] | join() }}
For this particular case, it seems that the rattler/jinja is more complex. The array needs to be joined, then split in script where the {%%} is accepted. Am I missing something?
Also, it seems that the comments are not completely ignored, I had mistakenly put a "xx' in a comment and it seemed to generate a syntax error, that went away after correction (not 100% sure, though)
Update: Not to be difficult, but the initial point was to have the list be defined once for the several packages that use it. I don't think it will work:
{% for bin in {{ qemu_bins | split(' ') | list }} %} fails to be parsed and a global {% fails with:
Error: × Parsing: failed to parse YAML: unexpected character: `%'
╭─[1:2]
1 │ {% qemu_bins_j = [
· ────────┬────────
· ╰── here
2 │ "bios.bin", "bios-256k.bin", "bios-microvm.bin", "qboot.rom", "vgabios.bin", "vgabios-cirrus.bin"
╰────
So I will use python -c ... ${{ qemu_bins }}.split() ..., which is neat enough I concede
FWIW I ended up with this for a similar problem (multiple if else):
build_num: >
${{ (build_pre | int + 2000) if mpi == "openmpi"
else ((build_pre | int + 1000) if mpi == "mpich"
else ((build_pre | int) if mpi == "nompi" else 0)) }}
Yes, even syntax such as
build_num: >
{% if mpi == "openmpi" %}
${{ build_pre | int + 2000 }}
{% else if ...
Would likely work. Not sure if we should endorse it, though :)
this may be specific enough to sha256 to warrant a new issue but I wonder whether there is a nicer way to write this:
source:
- if: numpy_min == "2.3"
then:
- url: ${{ source_url }}
sha256: eb30717dc7390a038c2247be1a7e8bf1a48b8a4701a01960804830a231631bef
- if: numpy_min == "2.2"
then:
- url: ${{ source_url }}
sha256: f6b9a08193c6291c3d6c4717d1ee1b0a8bf5050765165175affde7afaab1639f
- if: numpy_min == "2.1"
then:
- url: ${{ source_url }}
sha256: d61dd84290c6faaa35007346b79741e1e100915885a5a5a63d05b401eddbf583
- if: numpy_min == "2.0"
then:
- url: ${{ source_url }}
sha256: ba6a23a9d07bb782bda4f5852b476de4e16bad8f979c96e3df1d04e6bb6304c7
- if: numpy_min == "1.25"
then:
- url: ${{ source_url }}
sha256: 81f00561a6f0b92436b4c0b64dc9da25a8a31154a4fb0813e3337b2a0afd7db3
- if: numpy_min == "1.23"
then:
- url: ${{ source_url }}
sha256: 2deb6a9b0801e23c0597f3654d667238df581a85115d9f813d12ff0afcc842d3
- if: numpy_min == "1.22"
then:
- url: ${{ source_url }}
sha256: 0741447dbbaeb3db069341b53903f16d076fcdef931c0190fb15e5bc004463c9
templating directly into the sha256 field seems unsupported
It's possible to template the SHA256 but our current JSON schema will complain because it expects a hex string of 32 characters there. We can totally change the schema to also accept Jinja expressions (string starts / ends with ${{ ... }}).
I think the most ideal scenario for this case would be if you had a dictionary, so that you could do:
- url: ${{ source_url }}
sha256: ${{ hashes[numpy_min] }}
We migth be able to do that with some trickery :)
ah great, I mistakenly thought that was more than just the JSON schema complaining. In which case variants.yaml can handle it all nicely:
zip_keys:
- [numpy_min, numpy_cap, python_min, source_hash]
numpy_min: [ 2.3, 2.2, 2.1, 2.0, 1.25, 1.23, 1.22]
numpy_cap: [ 2.4, 2.3, 2.2, 2.1, 2.0, 1.25, 1.23]
python_min: [3.11, 3.10, 3.10, 3.10, 3.10, 3.10, 3.10]
source_hash:
- eb30717dc7390a038c2247be1a7e8bf1a48b8a4701a01960804830a231631bef
- f6b9a08193c6291c3d6c4717d1ee1b0a8bf5050765165175affde7afaab1639f
- d61dd84290c6faaa35007346b79741e1e100915885a5a5a63d05b401eddbf583
- ba6a23a9d07bb782bda4f5852b476de4e16bad8f979c96e3df1d04e6bb6304c7
- 81f00561a6f0b92436b4c0b64dc9da25a8a31154a4fb0813e3337b2a0afd7db3
- 2deb6a9b0801e23c0597f3654d667238df581a85115d9f813d12ff0afcc842d3
- 0741447dbbaeb3db069341b53903f16d076fcdef931c0190fb15e5bc004463c9
More pedestrian than all the syntax stuff discussed so far, but I was surprised that there's no elif: <cond> to go along with
if: <cond>
then: <stuff>
else: <stuff>
(not mentioned in the docs, and crashes an actual recipe where I tried this)
@h-vetinari we already discussed that it's invalid YAML right?
A single elif: surely isn't illegal, though I could see how multiple would be problematic.
I don't think it was discussed so far though, at least not with me.