rust Tracking Issue for RFC 3086: macro metavariable expressions

This is a tracking issue for the RFC "3086" (rust-lang/rfcs#3086). The feature gate for the issue is #![feature(macro_metavar_expr)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

[ ] Implement the RFC (cc @markbt -- has there been work done here already?)
[ ] Adjust documentation (see instructions on rustc-dev-guide)
[ ] Stabilization PR (see instructions on rustc-dev-guide)

Unresolved questions and bugs

[ ] Figure out problems around hygiene

Implementation history

Mar 26 '21 19:03 nikomatsakis

There is a working prototype on the markbt/rust/metavariable_expressions branch. This needs feature gating, and there are a couple of TODOs to resolve, but it's otherwise in reasonable shape. I'm planning to work on it over the coming weeks.

Mar 27 '21 12:03 markbt

Update (2021-04-07)

I've not yet started work on productionizing the prototype on the markbt/rust/metavariable_expressions branch. I plan to start later this month, free time permitting.

Apr 07 '21 08:04 markbt

Update (2021-05-09)

I still haven't started on this yet as some stuff came up last month that prevented from having the time to work on it. It's still in my plan to work on it, and hopefully I'll have some time soon.

May 09 '21 09:05 markbt

Update (2021-06-29)

Still no progress, as I haven't had any spare time to work on this project. I'm still planning to work on it, and hopefully will get some time soon.

Jun 29 '21 08:06 markbt

@markbt If you don't expect to find the bandwidth in the near future, would you potentially be interested in seeking help in the form of another owner for this initiative? If you're still interested in driving this, that's fine, but if you'd like us to switch gears from pinging you to broadcasting that the project could use help, we'd be happy to do that.

Jul 07 '21 17:07 joshtriplett

I'd be happy with any help if there's someone available. I still plan to work on it, but personal stuff is getting in the way at the moment. Sorry about that.

To recap: I have a working prototype on my branch at https://github.com/markbt/rust/tree/metavariable_expressions . The next steps are to rebase that onto the latest master, and then polish it up so that it's ready for inclusion. Then there's also the doc work to make sure the new feature is documented well. Help with any of this would be appreciated.

Jul 09 '21 08:07 markbt

@markbt Are you still working on this feature? If not, then I can pick up from where you stopped

Jan 29 '22 11:01 c410-f3r

I haven't had a chance to work on it for a while. I'm still interested in it, so happy to help out if you're picking it up, it just had to take a back burner relative to some personal things.I have a branch with a prototype implementation on my github fork. It's likely very out of date, so will need rebasing up to something more recent. Or perhaps you can just use it for inspiration and start from scratch. Let me know if I can help at all, although I don't have a lot of free time at the moment.

Jan 29 '22 16:01 markbt

Thank you @markbt

I will take a look at the branch and then get in touch if any questions arise

Jan 30 '22 00:01 c410-f3r

The implementation has been merged so please try testing the feature as much as possible to find any potential bugs

Mar 15 '22 12:03 c410-f3r

Thanks @c410-f3r for all your hard work!

From my limited experience with this feature, I have some concerns about the design:

The depth arguments are confusing because some of the functions count up from the most nested depth and some count down from the outermost level.
Also, indexing does not universally start at 0, and an index of 1 means different things for different meta-variable expressions.
I find the semantics of count confusing and hard to keep track of. Which level is it counting? Does depth 1 mean count everything or count the 2nd nested loop? Does depth 1 mean count everything or count only the outermost loop? Maybe a name like total_count would be clearer? Maybe the depth option should not be optional? Maybe count should be removed, and people can just sum the output of length?

Mar 15 '22 14:03 mark-i-m

Thank you @mark-i-m for reviewing https://github.com/rust-lang/rust/pull/93545 and for all your contributions to the compiler.

Hehehehe... You, me and @petrochenkov had trouble trying to understand this feature

The depth arguments are confusing because some of the functions count up from the most nested depth and some count down from the outermost level.

Indeed! It is much easier cognitively to keep everything with the same direction instead of having count going from outer-to-inner and index/length from inner-to-outer. Although I am not sure if semantic would be impacted.

Also, indexing does not universally start at 0, and an index of 1 means different things for different meta-variable expressions.

I find the semantics of count confusing and hard to keep track of. Which level is it counting? Does depth 1 mean count everything or count the 2nd nested loop? Does depth 1 mean count everything or count only the outermost loop? Maybe a name like total_count would be clearer? Maybe the depth option should not be optional? Maybe count should be removed, and people can just sum the output of length?

Yeap, someone that wrote ${count(foo)} will probably expect that ${count(foo, 1)} will return some value related to the amount of foos instead of the actual number of outer repetitions.

If repetitions are nested, then an optional depth parameter can be used to limit the number of nested repetitions that are counted. For example, a macro expansion like:

${count(x, 1)} ${count(x, 2)} ${count(x, 3)} $( a $( b $( $x )* )* )*

The three values this expands to are the number of outer-most repetitions (the number of times a would be generated), the sum of the number of middle repetitions (the number of times b would be generated), and the total number of repetitions of $x.

https://github.com/markbt/rfcs/blob/macro_metavar_expr/text/0000-macro-metavar-expr.md#count

And as you said, throwing nested loops into the equation will alter indexing making understanding even harder. Not to mention mixing other meta-variable expressions like length 🙁.

#![feature(macro_metavar_expr)]

fn main() {
    macro_rules! mac {
        ( $( [ $( $i:ident )* ] )* ) => {{
            // ***** No loop *****
            
            println!("{}", ${count(i)}); // 5
            println!("{}", ${count(i, 0)}); // 2
            
            // Same as ${count(i)}
            //println!("{}", ${count(i, 1)});
            
            // Fobirdden. Index out of bounds
            //println!("{}", ${count(i, 2)});
            
            // ***** Outer-most loop *****
            
            $(
                println!("{}", ${count(i)}); // 3 and 2
                
                // Same as ${count(i)}
                //println!("{}", ${count(i, 0)});
                
                // Fobirdden. Index out of bounds
                //println!("{}", ${count(i, 1)});
            )*

            // ***** Outer-most and inner-most loops *****
            
            $(
                $(
                    ${ignore(i)}

                    // Forbidden. Can't be placed inside the inner-most repetition
                    //println!("{}", ${count(i)});
                )*
            )*
        }};
    }
    
    mac!([a b c] [d e]);
}

Maybe total_count and a mandatory depth can be nice modifications but I am also not sure about the removal of count (Useful for nested stuff). Overall, I think that users will have a hard time even with a good set of documentation.

As for myself, I learned to accept the things as they are currently defined in RFC, hehehehe 😁

Mar 15 '22 22:03 c410-f3r

Any thoughts @markbt?

Mar 15 '22 22:03 c410-f3r

The depth arguments are confusing because some of the functions count up from the most nested depth and some count down from the outermost level.

+1. Being inconsistent here seems out of the question. I would vote for "distance from innermost", agreeing with the rationale in the RFC:

The meaning of the depth parameter in index and count originally counted inwards from the outer-most nesting. This was changed to count outwards from the inner-most nesting so that expressions can be copied to a different nesting depth without needing to change them.

Adding to this, having the depth parameters represent "distance from outermost" will lead to the following (unfortunate) user story:

I have a macro build_foo!. I want to refactor it to build_many_foos!, so I wrap the macro definition with $(...),*. Now all of the depth parameters are off by one so I increment them.

The only bad thing is that the English meaning of "depth" lends to "distance from outermost". But this is less important IMO.

We could support both by having negative numbers (e.g. ${index(-1)}) work as "distance from innermost", but that is probably not necessary. You'd also have to decide what to do with 0.

Maybe count should be removed, and people can just sum the output of length?

~Yes this might be a "less is more" scenario. You could potentially get what you need using expressions of index and length, like ${length(1)+index(2)} or ${length(1)*length(2)}, but we'd have to support math within ${..}.~ Edit: this doesn't work

	Position	Total
Current repetition	`index`	`length`
All repetitions		`count`

Has it been considered to make theses values bind-able within the macro pattern? This would take depth out of the picture completely since the variables are syntactically attached to a repetition. Inspired by ngFor in Angular.

macro_rules! mac {
    ($[i=index, l=length]($word:ident),*) => {
        $(println!("{}/{}: {}", $i, $l, stringify!($word));)*
    };
}

Mar 16 '22 14:03 camsteffen

It also seems odd to me that index doesn't support an ident parameter but all the other functions do.

Mar 16 '22 14:03 camsteffen

First off, thanks to @c410-f3r for implementing this. Really excited to see it merged, and I'm going to try to find some time to try it out soon.

Thanks for all the feedback, too. I'm going to try to explain my thinking about this, as I spent some time thinking through these points when I was drafting the RFC. This was quite some time ago, so my memory is a little rusty (excuse the pun). Overall I think the RFC as it is currently is still the best way to do it, but perhaps we need to document it better to make it clearer how things work (especially count vs length).

It also seems odd to me that index doesn't support an ident parameter but all the other functions do.

Both index and length do not take ident parameters. This is because macro repetitions happen in lockstep, and so there is only one index and count for the current repetition, no matter which ident is involved. Contrast this with count where it does matter what ident it involved, as different idents can repeat different amounts in nested repetitions.

I find the semantics of count confusing and hard to keep track of. Which level is it counting? Does depth 1 mean count everything or count the 2nd nested loop? Does depth 1 mean count everything or count only the outermost loop? Maybe a name like total_count would be clearer? Maybe the depth option should not be optional? Maybe count should be removed, and people can just sum the output of length?

count was pretty much the reason I started this RFC in the first place. Part of the reason for metavar expressions is to avoid constructing large constant expressions that the compiler has to fold down, as these have a performance impacts and limitations with large numbers of repetitions which can be entirely avoided as the compiler already knows the answer.

It may be we need to improve documentation, as teaching things can sometimes be difficult. count can be thought of as "count the number of times this ident will appear". For most people there will be no need to use a depth parameter - the default of counting all depths will be what they want. The "limit to a particular number of depths" is likely only needed for special cases, when it will be worth learning the details of how it works. If we want to ensure values start at 0, I think ${count(indent, 0)} could be legitimately made to always evaluate to 1: i.e. how often does it repeat with no repetition, which is vacuously once, and shift everything down by one. We would need to distinguish "not provided" from "0" (perhaps effectively Option<usize>, where None means "all depths", although syntactically I think we would want to avoid having to write Some(2)).

We could support both by having negative numbers (e.g. ${index(-1)}) work as "distance from innermost", but that is probably not necessary. You'd also have to decide what to do with 0.

Negative numbers could work if that makes it more understandable. 0 would still mean "this depth", which is the default. In this case positive numbers >=1 would be meaningless, so the minus sign effectively becomes mandatory. Maybe another way would be to name it something different to make it clear it goes the other way (height, surround, level, ...).

The depth arguments are confusing because some of the functions count up from the most nested depth and some count down from the outermost level.

They all count from the current repetition level. count counts inwards, as it is concerned with the number of times the ident repeats within the repetition we're currently inside of. The default is "all of them", as that's the most common case. index and length count outwards as they are concerned with the place of the repetition we are currently inside of relative to the other repetitions it is itself part of. The default is the inner-most one, as, again, it's the most common case. I think this confusion perhaps lends some weight to using a different name for the depth parameters of index and length, but I don't have a good suggestion to hand.

Has it been considered to make theses values bind-able within the macro pattern?

That's really neat, but I do find it a little harder to understand (it's a bit un-Rust-lish). I'm also not sure how it would solve the count case or the ignore case. It's also quite a different design so we'd presumably need to spin through the RFC process again.

Yeap, someone that wrote ${count(foo)} will probably expect that ${count(foo, 1)} will return some value related to the amount of foos instead of the actual number of outer repetitions.

It is related to the number of foos - specifically it's the number of next-level repetitions that include foo, which might be different to the number of repetitions that include bar in the case of expressions like $( [ $( $( $foo ),* );* / $( $bar ),* ] )

Mar 16 '22 21:03 markbt

Okay I now understand how count is categorically different from index and length. count is for inner repetitions and index and length are for outer repetitions, relative to the current expansion context. An ident is necessary for specifying an inner repetition since there could be multiple (and multiple within those etc.). But outer repetitions only need a relative depth. With this distinction in mind, it makes sense to me now that depth can point in two directions. (sorry for lagging) I would either rename depth to distance, or use a different word for "going outward" cases like level.

With regard to my alternate syntax idea, that would only work for the "outward looking" functions index and length. To that end, I am even more convinced that this would be good. It resolves the discrepancy with depth so that it now has its intuitive meaning. Having a different syntax for inward and outward looking values lends to a not-conflated mental model. To be clear, I am suggesting to keep count as is, but replace index and length as described earlier.

Another discrepancy I found in my understanding is that I assumed that count(x) would give the number of occurrences of $x in the macro input, but rather it is the number of times $x is expanded? This is usually but not always the same, and I think input count would be generally more useful and less surprising.

In any case, pointing out the inward/outward distinction upfront as it applies to the functions would resolve confusion. I think I assumed that all functions can be used in both directions (which doesn't make sense now).

Mar 17 '22 04:03 camsteffen

For count, depth could be "outward distance from $x" rather than "inward distance from count(..)". I think that would be more consistent and intuitive.

Hope you don't mind one more alternate idea. Named repetitions. This allows for referencing repetitions that do not otherwise have an ident.

macro_rules! mac {
    ($h(hello)*) => {
        println!("hello {} times", ${count(h)});
    };
}
mac!(hello hello hello);

Mar 17 '22 06:03 camsteffen

A little pseudo illustration. Hope it helps