rfcs Short Macro Invocation Syntax: m!123 and m!"abc"

May 18 '22 13:05 m-ou-se

This would be useful for the windows ecosystem (perhaps inside the windows crate) to declare wide string literals. Even though there are plenty of workarounds for wide string literals, having such a short syntax would make their use more common.

May 18 '22 13:05 rylev

Given that the wide-literals crate already provides w!(""), the w!"" syntax would start working right away for its users, without any change to the crate.

May 18 '22 13:05 m-ou-se

I still don't entirely see the value in cutting off the parentheses. The syntax of doing a wide string literal is already extremely short.

There's several crates for wide string literals, and other than the wide-literals crate they all use multi-character macro names that are words that a person can more easily understand when reading the source code. This even includes the const_utf16 crate that you wrote rylev, which picks the name encode!, which is a whopping 5 characters more than the minimum requirements. This suggests to me that being maximally terse isn't actually a great virtue in practice. If it really were so good to be maximally terse then all the crates would already use single letter macro names like w!, instead of full word names like encode! or utf16! or wstr!.

Let's have a realistic example:

use const_utf16::encode;
const MESSAGE: &[u16] = encode!("Hello, world!");
const MESSAGE: &[u16] = encode!"Hello, world!";

Is it really the ( and ) that are making the first line of code burdensome while the second line of code is free and clear?

May 18 '22 14:05 Lokathor

I'm worried about whether users might find precedence confusing. Right now, a macro invocation is a syntactic atom in its own right, being entirely self-delimiting. This is not the case for this proposal, and it opens up the field for confusion.

Is foo! x as i32 to be parsed as foo!(x as i32) or foo!(x) as i32?

Is foo! x? to be parsed as foo!(x?) or foo!(x)??

Is foo! -5 to be parsed as foo!(-5) or foo!(-) 5?

This last one is most confusing of all because negation can be both an operator, or part of a standalone integer literal and the behaviour changes depending on which it is interpreted as here.

I am unconvinced that the convenience of the removal of 2 characters is worth the potential for syntactic ambiguity (we can of course come up with well-defined rules for resolve this ambiguity, such as one binding to syntax atoms, but human brains do not work this way).

May 18 '22 14:05 zesterer

It's not about minimizing the amount of bytes of source code, or even the amount of time spent typing (though typing () takes me significantly longer than typing regular words), it's about the visual complexity when reading code.

Something like w!"asdf" looks like a single 'unit' to me, while encode!("asdf") has one level of 'nesting'. When I see w!"asdf" I see 'a wide string literal', but when I see w!("asdf") I see 'a string literal, passed to w!()'.

The difference is of course small, but it can add up:

File::open(w!("abc") + name + w!(".txt"));

This feels like a somewhat complicated expression, nesting two 'calls' inside the outer call.

Without the () for the macro invocations, this becomes:

File::open(w!"abc" + name + w!".txt");

To me, this is easier to read, as I visually process it as thing(thing + thing + thing) instead of thing(thing(thing) + thing + thing(thing)).

It all feels similar to why I prefer x < y + 1 over x < (y + 1). I already parse it correctly without the (); it just adds noise.

May 18 '22 14:05 m-ou-se

@zesterer That's mostly a formatting issue. You could ask the same question about m! (a + 1) * 3. Rustfmt helps by formatting that as m!(a + 1) * 3. Similarly, it should format m! x as i32 as m!x as i32.

Is foo! -5 to be parsed as foo!(-5) or foo!(-) 5?

We wouldn't accept foo!-. - is not a literal.

May 18 '22 14:05 m-ou-se

If the intention is to reduce visual complexity, then perhaps it is better to follow the precedence set by explicit numeric literal type annotations (such as 5u8) and allow this in postfix position? "hello, world"!w seems more natural to me than w!"hello, world" given the prior knowledge users have of numeric literal annotations.

May 18 '22 14:05 zesterer

entirely self-delimiting

I suppose one could argue that m!"asdf" is also 'self-delimiting', with "" rather than () as the delimiters.

Note that I'm only proposing this short-hand for literals, and nothing else. So there's no precedence rules about where the argument stops or anything like that.

May 18 '22 14:05 m-ou-se

If the intention is to reduce visual complexity, then perhaps it is better to follow the precedence set by explicit numeric literal type annotations (such as 5u8) and allow this in postfix position? "hello, world"!w seems more natural to me than w!"hello, world" given the prior knowledge users have of numeric literal annotations.

We already have b"asdf", with the modifier at the start. Regardless, I think it's best if we keep the macro invocation in order, to make this change as minimal as possible.

May 18 '22 14:05 m-ou-se

If it really were so good to be maximally terse then all the crates would already use single letter macro names like w!, instead of full word names like encode! or utf16! or wstr!.

Conversely, if w!("..") was good enough, we wouldn't be getting any requests for w".." or c".." or z".." and so on. Quite a few people seem excited about those, so it seems like they aren't satisfied with wstr!("..") or w!("..").

May 18 '22 15:05 m-ou-se

Should we allow m!r"..."? (I think yes.)

I think this would definitely be useful for the wide str case if the goal is to reduce visual noise. For example:

wide!r"\\.\pipe\local\pipe name" is much nicer than wide!"\\\\.\\pipe\\local\\pipe name"

May 18 '22 15:05 ChrisDenton

Would this result in println!"Hello world"; working? I find that kind of odd.

May 18 '22 15:05 tavianator

Yup. But this also already works:

println! {"hey {:?}",};

panic![".."];

let _ = vec! (1, 2);

thread_local! [
   ..
];

So I don't think that's a problem in practice. Using the conventional one of the three (or four) ways to invoke a macro is already part of Rust code style/formatting, and some are even handled by rustfmt. I don't think I've ever encountered a wild println![] or similar.

May 18 '22 16:05 m-ou-se

Conversely, if w!("..") was good enough, we wouldn't be getting any requests for w".." or c".." or z".." and so on. Quite a few people seem excited about those, so it seems like they aren't satisfied with wstr!("..") or w!("..").

I would suggest that what people want is something built into the default experience (at the language level, or in core) without having to pull in some crate to do it.

May 18 '22 16:05 Lokathor

I would support this, purely because for a while I've wanted some 'string literal macro' system. A way to, in user code, make things like b"str". My main gripe is that making owned strings for initialising structs is polluted by lots of into's or to_owned or to_string or String::from etc. These can distract from the text that I care about. A simple s!"I am an owned string" would definitely be an improvement in my books

May 18 '22 17:05 conradludgate

Would this result in println!"Hello world"; working?

That's an interesting point. This could well be far more than numerics, because with captured identifiers, would this be a de-facto transition to, say, format!"{a} - {b} = {c}"?

May 18 '22 17:05 scottmcm

[..] with captured identifiers, would this be a de-facto transition to, say, format!"{a} - {b} = {c}"?

The RFC mentions this as an example: f!"{a} {b}" (with use std::format as f;).

May 18 '22 17:05 m-ou-se

I haven't created an RFC for this yet, but I have a WIP implementation of custom literals on my rust-lang/rust fork. Given the lack of documentation there, I'll briefly explain here. Essentially, a new trait is introduced:

pub trait FromIntegerLiteral: Sized {
    type Input: sealed::Integer;
    fn from_integer_literal(i: Self::Input) -> Self;
}

This trait is a lang item. If the compiler expects a certain type, the known type does not match, and the expected type implements FromIntegerLiteral, the compiler will coerce the integer literal into <T as FromIntegerLiteral>::from_integer_literal(lit), where lit is guaranteed to be the type expected as input.

All implementations of FromIntegerLiteral are required to be impl const; this is enforced by the compiler. This is necessary so that the values can be used in any location and to ensure that there are no side effects. As const eval grows more powerful, so will the ability for custom literals. My goal is to always const eval the value. Invalid inputs must panic, which is functionally equivalent to emitting a compiler error.

As currently written, the trait is limited to accept integers as input. However, I did intend on expanding it to any literal, which would notably include strings. It would obviously be renamed in this situation. I believe an expanded trait that permits any literal would be quite powerful and would serve much the same purpose as this proposal. One notable exclusion would be f-strings, but I believe there was general support for f"{foo}" having compiler support in the future, hence why that syntax was reserved in the 2021 edition.

Personally I view custom literals as more ergonomic, more transparent, and with guaranteed const eval, more reliable. f-strings would be great to have, but I don't think this is the way to go about it. It may seem surprising that coercions work in this manner, but I assure you that the implementation linked is already mostly functional. The only thing missing is always const eval'ing the input. Custom literals combined with type ascription would be nearly identical to custom suffixes — you could do 5:cm to get a value that is 5cm (preferred formatting aside).

Edit: After some discussion on Zulip, the existing implementation will not work, but it is still possible to have custom literals in this user-facing manner.

May 18 '22 18:05 jhpratt

Should we allow m!b"abc" and m!b'x'? (I think yes.)

A counterargument: b"" is, to some extent, custom literal too - so its seems a little strange to allow m!b"" but disallow m!b!"".

May 18 '22 18:05 ChayimFriedman2

As a bystander and declarative macro fanatic, I'm a bit hesitant about this. It seems a bit surprising that this new form would only support a literal as its body. I could easily imagine a newcomer (or myself) writing something like this and being confused when it doesn't work:

macro_rules! macroroni {
    ($x:ident) => { /* TODO */ };
}

fn main() {
    macroroni!foo; //~ ERROR can only be used with macros that take a literal
}

This could also crop up during a refactor:

let default_animation_wide = w!"normal";
let y = self.sprite.set_animation( // hmm, i should pull `"normal"` into a const

Moving "normal" into a constant without changing default_animation_wide to use parentheses (or brackets) would cause an error, which might confuse the programmer.

const DEFAULT_ANIMATION: &str = "normal";

let default_animation_wide = w!DEFAULT_ANIMATION;
//~^ ERROR can only be used with macros that take a literal
let y = self.sprite.set_animation(DEFAULT_ANIMATION);

Even if this syntax is opened up to most other metavariable types, it's backwards-incompatible to open it to tts (unless delimited groups are ignored when using this syntax, which might also catch people by surprise):

macro_rules! macroroni {
    ($x:tt) => { panic!("{}", $x) };
    () => {};
}

fn main() {
    // this currently does nothing, but may panic if `tt` is accepted
    macroroni!();

    // If `tt` is accepted but delimited groups are rejected,
    // you might start with this...
    macroroni!1;
    // ... then realize you need an addition...
    macroroni!(1 + 1); //~ ERROR no rules expected the token `+`
    // ... and run into a stumbling block.
}

This syntax might also cause readability issues, especially if it's expanded to cover idents:

// I am a Linux user. What does `w!` mean?
let le_mot = w!"foo";
// Ugh, it's 12am... I need sleep...
// ? There's no variable named `shelllle_mot` in scope, is there?
let result = shell!le_mot;
// Oh! This is a macro call
let result = shell!(le_mot);

On the other hand, the m![] syntax provides some amount of precedent for accepting this form, as it is essentially a special case of m!literal for array literals (which aren't actually matched by literal metavariables).

May 19 '22 01:05 PatchMixolydic

I strongly oppose this proposal.

I've been programming in Rust for just under five years, so I'm probably not as experienced as most folks here. I will say however, that Rust was by far the hardest language for me to become proficient in. When I first picked it up, I had a hard time understanding the syntax by reading it (lifetimes and the turbofish were extremely confusing). Yet, I was able to approximate at first macro invocations as simple function calls that were somehow different but it didn't matter for the time being. In fact, as a newcomer, if I saw let le_mot = w!"foo"; I think I would have absolutely no idea what that does, and my brain would parse w! itself as a special token instead of understanding the bang as a macro invocation and the w as the name of that macro.

Today, I regularly try to convince folks who work on critical systems to use Rust: I worry that removing the delimiter tokens around macro invocations will make code significantly harder to understand for newcomers.

I'll also add that, many years ago, I had to learn VBScript (I forget which version). But one of the most confusing things was that invoking a function with parentheses and without them had a different behavior (one would allow reading the return value but not the other one IIRC).

May 19 '22 01:05 ChristopherRabotin

So, in general I sympathise with this proposal a lot, but my biggest concern with it is unfortunately a massive downside, since it conflicts with future proposals for postfix macros. Basically, I think this should explicitly state how ambiguity would be resolved for postfix macros, and why this feature would be useful with the existence of postfix macros, e.g. why it's good to have m!1234 instead of just 1234.m!, 1234.m!(), or even 1234m!.

Also, for a less-dealbreaking proposal, I would like to suggest explicitly only allowing or encouraging this on macros that accept one argument, since it means that refactoring to add extra arguments could complicate things. I also think it's worth elaborating on how to treat macros with optional non-literal arguments before literals, e.g. if a macro f accepts $(x:ident),* $y:literal, then is f!1 allowed?

May 19 '22 02:05 clarfonthey

Moving "normal" into a constant without changing default_animation_wide to use parentheses (or brackets) would cause an error, which might confuse the programmer.
const DEFAULT_ANIMATION: &str = "normal";

let default_animation_wide = w!DEFAULT_ANIMATION;
//~^ ERROR can only be used with macros that take a literal
let y = self.sprite.set_animation(DEFAULT_ANIMATION);

Depending on the macro, I would expect this to not work regardless. Whether its w!(DEFAULT_ANIMATION) or not since they might explicit need literals. So I don't think this is a limitation of this proposal

May 19 '22 07:05 conradludgate

To me, this is easier to read, as I visually process it as thing(thing + thing + thing) instead of thing(thing(thing) + thing + thing(thing)).

I think the exclamation mark already helps with that.

I am Java developer and Rust is easier to read than Kotlin for me, and I think it’s because Rust is explicit.

May 19 '22 16:05 tivrfoa

please no, i hate people do not use the one way to do things. if you provided them too many ways ,they can do each way as they want. you can see much more style of code. That's very bad code smell. You should not satisified eveyone. just keep one way to do one thing, just enough.

May 20 '22 01:05 ghost

please no, i hate people do not use the one way to do things.

There's already more than one way to call a macro: m!(), m![], and m!{}. That doesn't seem to be a problem either. Every macro just has a convention for which style to use, such as () for println, [] for vec, and {} for thread_local. All I'm proposing is a fourth option: m!123, such that that can be the convention for macros like w!"wide str" or bignum!1234 that have no need for the extra delimiters.

May 25 '22 13:05 m-ou-se

I should clarify: the point is not to be able to invoke a macro with both m!(123) and m!123 syntax. The point is to have m!123 syntax for macros that 'feel' like literals, such as w!".." or bignum!1234.

That w!("..") will also work is just a side effect of how macros don't see the exact syntax with which they were invoked, which is why we also accept vec!{} and println![]. I don't think that that's a feature, but I also don't think it's a problem, because it's not been a problem in the past either.

I've updated the text and title to clarify.

May 25 '22 13:05 m-ou-se

So, in general I sympathise with this proposal a lot, but my biggest concern with it is unfortunately a massive downside, since it conflicts with future proposals for postfix macros.

How does it conflict? If m!123 syntax conflicts, then why wouldn't the existing m!(123) syntax conflict too?

May 25 '22 13:05 m-ou-se

This could also crop up during a refactor:
let default_animation_wide = w!"normal";
let y = self.sprite.set_animation( // hmm, i should pull `"normal"` into a const
Moving "normal" into a constant without changing default_animation_wide to use parentheses (or brackets) would cause an error, which might confuse the programmer.
const DEFAULT_ANIMATION: &str = "normal";

let default_animation_wide = w!DEFAULT_ANIMATION;
//~^ ERROR can only be used with macros that take a literal
let y = self.sprite.set_animation(DEFAULT_ANIMATION);

The idea is that w!"normal" should feel like a single unit, just like b"abc" or r"abc". For those, you also wouldn't attempt to move the "abc" part to a constant, while leaving the b or r part behind:

const ABC: &str = "abc";

let abc = bABC;

Instead, you'd move the unit as a whole, including the b:

const ABC: [u8; 3] = b"abc";

let abc = ABC;

So, the same for w!:

const DEFAULT_ANIMATION: &[u16] = w!"normal";

let default_animation_wide = DEFAULT_ANIMATION;

May 25 '22 13:05 m-ou-se

The vec![] macro is an interesting case which already mostly operates like this, since array literals are written with []: In order to turn the array literal [1, 2, 3] into a Vec, all we need to do is to prepend vec!. We don't have to invoke it like vec!([1, 2, 3]). Similarly, I'd like it if all you need to do to turn the integer literal 123 into a bignum, is to prepend it with bignum!, without having to invoke it with parenthesis like bignum!(123).

- let a = [1, 2, 3];
+ let a = vec![1, 2, 3];
          ^^^^

- let b = "abc";
+ let b = w!"abc";
          ^^

- let c = 123;
+ let c = bignum!123;
          ^^^^^^^

May 25 '22 13:05 m-ou-se

rfcs rfcs copied to clipboard

Short Macro Invocation Syntax: m!123 and m!"abc"

rfcs
rfcs copied to clipboard