rattler-build icon indicating copy to clipboard operation
rattler-build copied to clipboard

Idea for better (non-redundant, more compact) branching syntax for selectors?

Open h-vetinari opened this issue 1 year ago • 28 comments

While thinking about #1154, it occurred to me that it would be even nicer if we could do a match-style syntax that doesn't have to choose a single branch but could apply several!

        files:
          - match: linux
              - something_linux
            match: osx
              - something_osx
            match: unix         # e.g. on linux, this should be taken
              - something_unix  # _in addition_ to the linux-only case
            match: win
              - something_win
            match: unix or win
              - something_that_applies_to_everything
            # no more `else:`, only explicit matches!

I guess it might be a bit late for such deep syntactic changes, but IMO this would make it much nicer to write recipes, rather than having to constantly restart if: branches to avoid exhaustivity-problems.

It would allow writing using a single match: (or whatever you want to call it, like the old sel: before it turned into if:) block for every section of the recipe, which would be much nicer IMO if we're looking at translating things like

examples

this

    - cython                              # [build_platform != target_platform]
    - vtk =*=osmesa_*                     # [build_platform != target_platform and build_variant == "noqt"]
    - vtk =*=qt_*                         # [build_platform != target_platform and build_variant != "noqt"]
    - qt-main                             # [build_platform != target_platform and build_variant == "pyqt"]
    - qt6-main                            # [build_platform != target_platform and build_variant == "pyside6"]

or this

    - crossenv                            # [build_platform != target_platform]
    - maturin >=1.3.2,<2                  # [build_platform != target_platform]
    - {{ stdlib("c") }}
    - {{ compiler('c') }}
    - {{ compiler('cxx') }}               # [win]
    - {{ compiler('rust') }}
    - posix                               # [build_platform == "win-64"]
    - cmake
    - make                                # [unix]
    - cargo-feature                       # [build_platform != target_platform and target_platform == "win-64"]

etc.

The more dimensions (unix vs win, native vs. cross, python version, CUDA or not, implementation variants for openmpi/BLAS/QT/etc.) we need to cover in a given recipe, the harder it gets to do that with simple if: else: branches. Consider something like

        build:
          - match: build_platform != target_platform
              # cross-compilation
            match: build_platform != target_platform and cuda_version
              # cross-compiled CUDA
            match: match(cuda, "<12")
              # CUDA 11.x
            match: match(cuda, ">=12.0")
              # CUDA 12.x
            match: linux
              # linux
            match: osx
              # osx
            match: unix
              # unix
            match: win
              # win
            match: blas_impl == "mkl"
              # mkl

It would be a much lower mental overhead to be able to freely shuffle around lines based on which sections they apply to (e.g. not having to restructure the following into correctly (non-)overlapping if: statements), only adding lines to the selector where they are supposed to be applied.

h-vetinari avatar Nov 02 '24 04:11 h-vetinari

It's completely impossible because duplicate keys are forbidden :(

wolfv avatar Nov 02 '24 06:11 wolfv

It's completely impossible because duplicate keys are forbidden :(

Well, how about making each - match: a separate list entry? That's also how multiple if:'s work IIUC.

In other words, the following (note the extra - everywhere)

        build:
          - match: build_platform != target_platform
              # cross-compilation
          - match: build_platform != target_platform and cuda_version
              # cross-compiled CUDA
          - match: match(cuda, "<12")
              # CUDA 11.x
          - match: match(cuda, ">=12.0")
              # CUDA 12.x
          - match: linux
              # linux
          - match: osx
              # osx
          - match: unix
              # unix
          - match: win
              # win
          - match: blas_impl == "mkl"
              # mkl

... should be desugared to the same list as the impossible variant that has multiple match:. The only thing the desugaring would need to learn is to ignore empty lists (for a match: that doesn't apply), though I guess that's already covered by the existing infra (e.g. for an if: without else:)

h-vetinari avatar Nov 02 '24 07:11 h-vetinari

Why not use a top level match with nested cases?

baszalmstra avatar Nov 02 '24 07:11 baszalmstra

Why not use a top level match with nested cases?

I'd be happy with that too!

h-vetinari avatar Nov 02 '24 07:11 h-vetinari

But how is it different from multiple if statements?

wolfv avatar Nov 02 '24 07:11 wolfv

You dont need to repeat the negation of the if clauses.

You can more easily write something specific for win, macos, unix without having to repeat not win, etc.

Sorry, im on a phone so its a bit hard to type a proper example.

baszalmstra avatar Nov 02 '24 07:11 baszalmstra

For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):

build:
  script: |
    echo "Is ${{ target_platform == "linux-64" }}"
    {% if target_platform == "linux-64" %}
      echo "I AM A LINUX"
    {% elif osx %}
      echo "I AM A MAC"
    {% endif %}

And of course, you could just as well use bash.

So I am not sure if it's worth it to make our syntax more complicated or not ...

Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)

wolfv avatar Nov 02 '24 12:11 wolfv

One of the design goals was to make the recipe easier to parse for machines. I think what you are suggesting is taking a step back. Having a match-kinda statement seems like a great (and logical) building block.

baszalmstra avatar Nov 02 '24 13:11 baszalmstra

Yeah but we've never semantically understood the contents of the scripts themselves so I'm not sure if we need that. This doesn't affect the "everything is pure YAML" property.

wolfv avatar Nov 02 '24 13:11 wolfv

Sure but just as you mentioned in another issue you just closed, we can validate the control flow statements using the schema. That would become harder and harder the more jinja we start using.

baszalmstra avatar Nov 02 '24 14:11 baszalmstra

If we build our own LSP though .. :) I think I do get your point, but I don't see it super different from e.g. people having issues with their bash scripts inside the script tag.

wolfv avatar Nov 02 '24 14:11 wolfv

For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):

build: script: | echo "Is ${{ target_platform == "linux-64" }}" {% if target_platform == "linux-64" %} echo "I AM A LINUX" {% elif osx %} echo "I AM A MAC" {% endif %} And of course, you could just as well use bash.

So I am not sure if it's worth it to make our syntax more complicated or not ...

Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)

This does not seem to work:

          {% for bin in qemu_bins %}
            test -f ${QEMU_LD_PREFIX}/{{ bin }}
          {% endfor %}
 │ │ │ + test -f $PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/ld-2.17.so
 │ │ │ + test -f '$PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/{{' bin '}}'
 │ │ │ $PREFIX/etc/conda/test-files/qemu-user-execve-aarch64/0/conda_build.sh: line 14: test: too many arguments

The issue with passing to a bash -c ... is that ${{ xxx }} expands as ["...", "...",] which trips-up bash as it is expecting arrays to be more like (xxx, xxx, ), so this also fails: bash -c 'l=${{ qemu_bins | split(" ") | list }}; for bin in "\${l[@]}"; do test -f ${QEMU_LD_PREFIX}/${bin}; done'

MementoRC avatar Nov 24 '24 03:11 MementoRC

Where did you set qemu_bins? The context has a behvior where it casts everything to a string.

But if you do:

          {% set qemu_bins = ["foo", "bar"] %}
          {% for bin in qemu_bins %}
            test -f ${QEMU_LD_PREFIX}/{{ bin }}
          {% endfor %}

It has a chance of working :)

wolfv avatar Nov 24 '24 09:11 wolfv

Why not use a top level match with nested cases?

Actually, one reason why I do have a preference for bare match:, is that match: - case ... -case suggests disjoint conditions again (at least based on how pattern matching works in most languages), whereas I'd like to not match one specific thing to various disjoint cases, but rather freely match conditions for different quantities that may overlap. Using a different syntax there would avoid triggering some unhelpful muscle memory.

h-vetinari avatar Nov 24 '24 10:11 h-vetinari

@h-vetinari sorry, not following. What's the difference of match then to a number of if clauses if the match' clauses are disjoint? I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.

match:
  - case: unix
    then: 
      - bla
  - case: win and arm64
    then: 
      - foobar
  - case: win
    then:
      - foobar-win64 # NOT happening on arm64

wolfv avatar Nov 24 '24 12:11 wolfv

Also your initial example is not at all valid YAML if I see that correctly.

wolfv avatar Nov 24 '24 12:11 wolfv

Attached two screenshots highlighting the issues (from https://yamlchecker.com/)

Image Image

wolfv avatar Nov 24 '24 12:11 wolfv

I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.

I explained why that's not desirable in the OP (or at least that there are cases where branches shouldn't be exclusive).

Also your initial example is not at all valid YAML if I see that correctly.

Yes, I hate YAML. 🙃

h-vetinari avatar Nov 24 '24 12:11 h-vetinari

The way I understand it is that you want "if"-syntax without then, correct?

You want falsey branches to be skipped, but every truthy branch to be taken, right?

In e.g. Rust, when usign match, only the first truthy branch is taken. In C/C++ you would have to add a break for this behavior. In Python also only the first truthy value is matched.

But if I understand your proposal correctly, you want every truthy branch to "match"?

Can you add corrected YAML to your proposal? Because I think if you write out proper YAML, it will look just like the current if statements.

wolfv avatar Nov 24 '24 13:11 wolfv

It has a chance of working :)

Right! Silly me, I forgot I needed to split (I did know that, just got caught in the Jinja mixin). The variable is in the context section, it is a ${{ [x, y, z] | join() }}

For this particular case, it seems that the rattler/jinja is more complex. The array needs to be joined, then split in script where the {%%} is accepted. Am I missing something?

Also, it seems that the comments are not completely ignored, I had mistakenly put a "xx' in a comment and it seemed to generate a syntax error, that went away after correction (not 100% sure, though)

Update: Not to be difficult, but the initial point was to have the list be defined once for the several packages that use it. I don't think it will work: {% for bin in {{ qemu_bins | split(' ') | list }} %} fails to be parsed and a global {% fails with:

Error:   × Parsing: failed to parse YAML: unexpected character: `%'
   ╭─[1:2]
 1 │ {% qemu_bins_j = [
   ·  ────────┬────────
   ·          ╰── here
 2 │       "bios.bin", "bios-256k.bin", "bios-microvm.bin", "qboot.rom", "vgabios.bin", "vgabios-cirrus.bin"
   ╰────

So I will use python -c ... ${{ qemu_bins }}.split() ..., which is neat enough I concede

MementoRC avatar Nov 24 '24 15:11 MementoRC

FWIW I ended up with this for a similar problem (multiple if else):

  build_num: >
    ${{ (build_pre | int + 2000) if mpi == "openmpi"
    else ((build_pre | int + 1000) if mpi == "mpich"
    else ((build_pre | int) if mpi == "nompi" else 0)) }}

HaoZeke avatar May 20 '25 23:05 HaoZeke

Yes, even syntax such as

build_num: >
   {% if mpi == "openmpi" %}
       ${{ build_pre | int + 2000 }}
   {% else if ... 

Would likely work. Not sure if we should endorse it, though :)

wolfv avatar May 21 '25 08:05 wolfv

this may be specific enough to sha256 to warrant a new issue but I wonder whether there is a nicer way to write this:

source:
  - if: numpy_min == "2.3"
    then: 
    - url: ${{ source_url }}
      sha256: eb30717dc7390a038c2247be1a7e8bf1a48b8a4701a01960804830a231631bef
  - if: numpy_min == "2.2"
    then: 
    - url: ${{ source_url }}
      sha256: f6b9a08193c6291c3d6c4717d1ee1b0a8bf5050765165175affde7afaab1639f
  - if: numpy_min == "2.1"
    then: 
    - url: ${{ source_url }}
      sha256: d61dd84290c6faaa35007346b79741e1e100915885a5a5a63d05b401eddbf583
  - if: numpy_min == "2.0"
    then: 
    - url: ${{ source_url }}
      sha256: ba6a23a9d07bb782bda4f5852b476de4e16bad8f979c96e3df1d04e6bb6304c7
  - if: numpy_min == "1.25"
    then: 
    - url: ${{ source_url }}
      sha256: 81f00561a6f0b92436b4c0b64dc9da25a8a31154a4fb0813e3337b2a0afd7db3
  - if: numpy_min == "1.23"
    then: 
    - url: ${{ source_url }}
      sha256: 2deb6a9b0801e23c0597f3654d667238df581a85115d9f813d12ff0afcc842d3
  - if: numpy_min == "1.22"
    then: 
    - url: ${{ source_url }}
      sha256: 0741447dbbaeb3db069341b53903f16d076fcdef931c0190fb15e5bc004463c9

templating directly into the sha256 field seems unsupported

lucascolley avatar Aug 12 '25 22:08 lucascolley

It's possible to template the SHA256 but our current JSON schema will complain because it expects a hex string of 32 characters there. We can totally change the schema to also accept Jinja expressions (string starts / ends with ${{ ... }}).

I think the most ideal scenario for this case would be if you had a dictionary, so that you could do:

- url: ${{ source_url }}
  sha256: ${{ hashes[numpy_min] }}

We migth be able to do that with some trickery :)

wolfv avatar Aug 13 '25 08:08 wolfv

ah great, I mistakenly thought that was more than just the JSON schema complaining. In which case variants.yaml can handle it all nicely:

zip_keys:
- [numpy_min, numpy_cap, python_min, source_hash]

numpy_min:  [ 2.3,  2.2,  2.1,  2.0, 1.25, 1.23, 1.22]
numpy_cap:  [ 2.4,  2.3,  2.2,  2.1,  2.0, 1.25, 1.23]
python_min: [3.11, 3.10, 3.10, 3.10, 3.10, 3.10, 3.10]
source_hash:
- eb30717dc7390a038c2247be1a7e8bf1a48b8a4701a01960804830a231631bef
- f6b9a08193c6291c3d6c4717d1ee1b0a8bf5050765165175affde7afaab1639f
- d61dd84290c6faaa35007346b79741e1e100915885a5a5a63d05b401eddbf583
- ba6a23a9d07bb782bda4f5852b476de4e16bad8f979c96e3df1d04e6bb6304c7
- 81f00561a6f0b92436b4c0b64dc9da25a8a31154a4fb0813e3337b2a0afd7db3
- 2deb6a9b0801e23c0597f3654d667238df581a85115d9f813d12ff0afcc842d3
- 0741447dbbaeb3db069341b53903f16d076fcdef931c0190fb15e5bc004463c9

lucascolley avatar Aug 13 '25 08:08 lucascolley

More pedestrian than all the syntax stuff discussed so far, but I was surprised that there's no elif: <cond> to go along with

if: <cond>
then: <stuff>
else: <stuff>

(not mentioned in the docs, and crashes an actual recipe where I tried this)

h-vetinari avatar Nov 07 '25 04:11 h-vetinari

@h-vetinari we already discussed that it's invalid YAML right?

wolfv avatar Nov 07 '25 09:11 wolfv

A single elif: surely isn't illegal, though I could see how multiple would be problematic.

I don't think it was discussed so far though, at least not with me.

h-vetinari avatar Nov 07 '25 10:11 h-vetinari