fix: special case jinja
There is a confusing behavior about types with rattler-build and minijinja.
When we have something like:
context:
variable: ${{ "1234" }}
Then in the currently released version of rattler-build, 1234 will become an integer because the Jinja evaluates to 1234 which will be interpreted by YAML as a integer.
However, we can fix this by special casing Jinja expressions (starting with ${{ and ending with }}). We can fix this by evaluating the Jinja as an expression and look at the Jinja type that comes out.
If it is a string, we can then quote the string and forbid YAML to coerce this type, so that a string stays a string.
This will also fix this situation:
context:
env_var: ${{ env.get("INTEGER") }} # <- now this will always be a string, if not explicitly casted
Where we may run into trouble is with something like:
context:
joined: ${{ "1" }}234
This would be evaluated as integer 1234 with the current code.
However, we could also decide to allow booleans or numbers to only come from simple Jinja expressions and not proper "templates" (ie. always cast anything more complex than a single expression to string).
With this we could also revisit https://github.com/prefix-dev/rattler-build/issues/971 and https://github.com/prefix-dev/rattler-build/issues/1423
We can change this further to allow Jinja expressions to return list and map objects to YAML, and then ingest them as lists or YAMLs from Jinja.
We could then also make the following use case work, where a user expands a list into YAML:
context:
mylist: [a,b,c]
requirements:
run:
- ${{ mylist }} # should expand to a SequenceNode and not a ScalarNode
Using this recipe:
context:
cuda_version: ${{ env.get("RAPIDS_CUDA_VERSION") }}
cuda_major: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] }}
package:
name: test-package
version: 1.0
build:
script:
- if: cuda_major == "12"
then:
- echo "cuda_major is a STRING and is 12"
else:
- echo "cuda_major is NOT a STRING or is not 12 (value was ${{ cuda_major }})"
we still cannot get the 12 as string, i am still working on it to check if we can make a solution
$env:RAPIDS_CUDA_VERSION="12.8.0"; cargo run -- build --recipe recipe.yaml
on powershell
Indeed, it seems to go wrong somewhere in the code.
context:
cuda_version: ${{ env.get("RAPIDS_CUDA_VERSION") }}
cuda_major: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] }}
cuda_major_is_string: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] is string }}
Running cargo r -- build --recipe ./examples/ajinja --render-only
Gives us:
[
{
"recipe": {
"schema_version": 1,
"context": {
"cuda_version": "12.8.0",
"cuda_major": 12,
"cuda_major_is_string": true
},
"package": {
"name": "test-package",
...
@wolfv There is a huge issue that is not producable in WSL and windows, but it is in all github runners, which is mainly about the unit tests (pkg_hash is the most serious one) I tried to solve it, change the formatters for hash, change the calculation logic and none of them passes. It also doesn't recognize
${{ PYTHON }} --help
now, which is another test fails now. Tried to fix that as well but the only solution i could find for 5 hours :D was making it
- python
@beckermr the problem that we are fixing is the following:
foobar: ${{ "1234" }} <- would evaluate to an integer
This is quite surprising I find. Having to wrap Jinja expressions in quotes to force a string is more cumbersome IMO. The code in this PR works quite well and is less surprising when dealing with env vars and similar things.
Yes. Jinja always does string substitutions. The types of items do not cross the boundary of the ${{ }} expression.
The issue with having the types cross is that for people who use jinja properly as a string templating, it becomes ambiguous and confusing.
This case is now really hard to predict:
var: "${{ "1234" }}"
Does the types cross the boundary of the jinja quote the internal quotes this pr implicitly adds? Does it ignore them? Does it turn them to single quotes? Does it render a nonsense yaml value? Why is one of these obviously the answer when the others are not?
This PR actually makes the interaction of the jinja templating with the yaml LESS clear.
The rule for jinja2 is simple:
"jinja produces strings that are inserted into the yaml."
Anything else is more complicated and surprising as you work through the edge cases.
For this case:
var: "${{ "1234" }}"
The (outer) quotes will always win and force whatever is inside to be a string.
The more ambigous case (but not really, due to outer quotes) would be:
var: "${{ 1234 }}"
where the inner type is an integer but the quotes cast it to a string.
Yep. Those examples exactly illustrate my point.
I have written out explicitly how the jinja+yaml works for four cases below.
a
var: ${{ 1234 }}
renders using the following steps
-
jinja parses+evals the text inside
${{ }}(it results in an int1234) -
jinja renders the final value to a string
"1234" -
jinja inserts the value of the string into the YAML This yields
var: 1234 -
YAML parses
var: 1234to a dict with a keyvarand a value1234as an int
b
var: "${{ "1234" }}"
renders using these steps
-
jinja parses+evals the text inside
${{ }}(it results in a string"1234") -
jinja renders the final value to a string (since step 1 yielded a string already, this step is a no-op)
-
jinja inserts the value of the string into the YAML This yields
var: "1234" -
YAML parses
var: "1234"to a dict with a keyvarand a value"1234"as a string
c
var: "${{ 1234 }}"
renders using these steps
-
jinja parses+evals the text inside
${{ }}(it results in an int1234) -
jinja renders the final value to a string
"1234" -
jinja inserts the value of the string into the YAML This yields
var: "1234" -
YAML parses
var: "1234"to a dict with a keyvarand a value"1234"as a string
d
var: ${{ "1234" }}
renders using the following steps
-
jinja parses+evals the text inside
${{ }}(it results in a string"1234") -
jinja renders the final value to a string (since step 1 yielded a string already, this step is a no-op)
-
jinja inserts the value of the string into the YAML This yields
var: 1234 -
YAML parses
var: 1234to a dict with a keyvarand a value1234as an int
This PR would change the behavior of case d to insert a quoted string instead of the value of the string (so step 3 changes) based on the type of the underlying jinja2 expression (computed in case 1).
However, if that rule is applied consistently, then case b should yeild var: ""1234"" with the extra set of quotes since in case b the type of the jinja2 expression ends up as a string in step 1.
The only way to tell b should be different than d in an automated way is to figure out what type YAML would parse and then account for that. That extra logic where the jinja2 parser has to reach out beyond itself to figure out why type to return breaks the clean separation between the jinja2 parsing and the yaml parsing.
The downstream consequence is that we've deviated from the standard behavior for these tools meaning that
- Folks who pick up off-the-shelf jinja2 and yaml parsers will get different results compared to rattler-build.
- Existing code bases that parse v1 recipes (like the bot, conda-recipe-manager, conda-smithy etc.) would need complicated special-casing / behavior to implement this rule.
- People who understand how templating engines work will be surprised by the results.
This entire thing reminds me of selectors in conda build v0 recipes. Instead of using the tools as intended, extra syntax and meaning was introduced by using python eval in comments in yaml. This choice caused non-trivial issues, work, and pain for the entire ecosystem. Let's please not make this mistake again.
I think I can live with your point @beckermr.
However, one thing that doesn't work right now is:
context:
specs: [a,b,c]
requirements:
run:
- ${{ specs }}
And I think we should make that work (this will be currently interpreted as - "[a, b, c]"
The other thing that will be much harder to make work is:
context:
toml_contents: ${{ load_from_file("pyproject.toml") }}
We could potentially serialize the file as JSON on a single line and insert it that way into the YAML to get something that we could later index into. But not sure if that is a great idea.
One workaround would be something like:
context:
_loader: |
{% set toml = load_from_file("pyproject.toml") %}
version: ${{ toml.version }}
But we don't advertise this capability widely :)
Unfortunately these special cases are also outside standard jinja2 and yaml behavior. Thus I don't think you can change their behavior without the same consequences as above.