rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Inferred types `_::Enum`

Open JoshBashed opened this issue 2 years ago • 137 comments

This RFC is all about allowing types to be inferred without any compromises. The syntax is as follows. For additional information, please read the bellow.

struct MyStruct {
    value: usize
}

fn my_func(data: MyStruct) { /* ... */ }

my_func(_ {
    value: 0
});

I think this is a much better and more concise syntax.

If you plan on pressing the dislike button, please leave a comment explaining your disproval. Every piece of constructive feedback helps.

Rendered

JoshBashed avatar Jun 07 '23 05:06 JoshBashed

I'm not necessarily against the RFC, but the motivation and the RFC's change seem completely separate.

I don't understand how "people have to import too many things to make serious projects" leads to "and now _::new() can have a type inferred by the enclosing expression".

Lokathor avatar Jun 07 '23 07:06 Lokathor

I don't understand how "people have to import too many things to make serious projects" leads to "and now _::new() can have a type inferred by the enclosing expression".

In crates like windows-rs even in the examples, they import *. This doesn't seem like good practice and with this feature, I hope to avoid it.

use windows::{
    core::*, Data::Xml::Dom::*, Win32::Foundation::*, Win32::System::Threading::*,
    Win32::UI::WindowsAndMessaging::*,
};

JoshBashed avatar Jun 07 '23 15:06 JoshBashed

Even assuming I agreed that's bad practice (which, I don't), it is not clear how that motivation has lead to this proposed change.

Lokathor avatar Jun 07 '23 15:06 Lokathor

Even assuming I agreed that's bad practice (which, I don't), it is not clear how that motivation has lead to this proposed change.

How can I make this RFC more convincing? I am really new to this and seeing as you are a contributor I would like to ask for your help.

JoshBashed avatar Jun 07 '23 16:06 JoshBashed

First, I'm not actually on any team officially, so please don't take my comments with too much weight.

That said:

  • the problem is that you don't like glob imports.
  • glob imports are usually done because listing every item individually is too big of a list, or is just annoying to do.
  • I would expect the solution to somehow be related to the import system. Instead you've expanded how inference works.

Here's my question: Is your thinking that an expansion of inference will let people import less types, and then that would cause them to use glob imports less?

Assuming yes, well this inference change wouldn't make me glob import less. I like the glob imports. I want to write it once and just "make the compiler stop bugging me" about something that frankly always feels unimportant. I know it's obviously not actually unimportant but it feels unimportant to stop and tell the compiler silly details over and over.

Even if the user doesn't have to import as many types they still have to import all the functions, so if we're assuming that "too many imports" is the problem and that reducing the number below some unknown threshold will make people not use glob imports, I'm not sure this change reduces the number of imports below that magic threshold. Because for me the threshold can be as low as two items. If I'm adding a second item from the same module and I think I might ever want a third from the same place I'll just make it a glob.

Is the problem with glob imports that they're not explicit enough about where things come from? Because if the type of _::new() is inferred, whatever the type of the _ is it still won't show up in the imports at the top of the file. So you still don't know specifically where it comes from, and now you don't even know the type's name so you can't search it in the generated rustdoc.

I hope this isn't too harsh all at once, and I think more inference might be good, but I'm just not clear what your line of reasoning is about how the problem leads to this specific solution.

Lokathor avatar Jun 07 '23 18:06 Lokathor

Is your thinking that an expansion of inference will let people import less types, and then that would cause them to use glob imports less?

Part of it yes, but, I sometimes get really frustrated that I keep having to specify types and that simple things like match statements require me to sepcigy the type every single time.

whatever the type of the _ is it still won't show up in the imports at the top of the file. So you still don't know specifically where it comes from, and now you don't even know the type's name so you can't search it in the generated rustdoc.

Its imported in the background. Although we don't need the exact path, the compiler knows and it can be listed in the rust doc.

I hope this isn't too harsh all at once, and I think more inference might be good, but I'm just not clear what your line of reasoning is about how the problem leads to this specific solution.

Definitely not, you point out some great points and your constructive feedback is welcome.

JoshBashed avatar Jun 07 '23 18:06 JoshBashed

Personally _::new() and _::Variant "look wrong" to me although i cant tell why, I would expect <_>::new() and <_>::Variant to be the syntax, no suggestion on _ { ... } for struct exprs which tbh also looks wrong but <_> { ... } isnt better and we dont even support <MyType> { field: expr }

BoxyUwU avatar Jun 07 '23 18:06 BoxyUwU

I would like to suggest an alternative rigorous definition that satisfies the examples mentioned in the RFC (although not very intuitive imo):


When one of the following expression forms (set A) is encountered as the top-level expression in the following positions (set B), the _ token in the expression form should be treated as the type expected at the position.

Set A:

  • Path of the function call (e.g. _::function())
  • Path expression (e.g. _::EnumVariant)
  • When an expression in set A appears in a dot-call expression (expr.method())
  • When an expression in set A appears in a try expression (expr?)
  • When an expression in set A appears in an await expression (expr.await)

Set B:

  • A pattern, or a pattern option (delimited by |) in one of such patterns
  • A value in a function/method call argument list
  • The value of a field in a struct literal
  • The value of a value in an array/slice literal
  • An operand in a range literal (i.e. if an expression is known to be of type Range<T>, expr..expr can infer that both exprs are of type T)
  • The value used with return/break/yield

Set B only applies when the type of the expression at the position can be inferred without resolving the expression itself.


Note that this definition explicitly states that _ is the type expected at the position in set B, not the expression in set A. This means we don't try to infer from whether the result is actually feasible (e.g. if _::new() returns Result<MyStruct>, we still set _ as MyStruct and don't care whether new() actually returns MyStruct).

Set B does not involve macros. Whether this works for macros like vec![_::Expr] depends on the macro implementation and is not part of the spec (unless it is in the standard library).

Set A is a pretty arbitrary list for things that typically seem to want the expected type. We aren't really inferring anything in set A, just blind expansion based on the inference from set B. These lists will need to be constantly maintained and updated when new expression types/positions appear.

SOF3 avatar Jun 12 '23 02:06 SOF3

s (set A) is encountered as the top-level expression in the following positions (set B), the _ token in the expression form should be treated as the type expected at th

That is so useful! Let me fix it now.

JoshBashed avatar Jun 12 '23 02:06 JoshBashed

One interesting quirk to think about (although unlikely):

fn foo<T: Default>(t: T) {}

foo(_::default())

should this be allowed? we are not dealing with type inference here, but more like "trait inference".

SOF3 avatar Jun 12 '23 03:06 SOF3

One interesting quirk to think about (although unlikely):

fn foo<T: Default>(t: T) {}

foo(_::default())

should this be allowed? we are not dealing with type inference here, but more like "trait inference".

I think you would have to specify the type arg on this one because Default is a trait and the type is not specific enough.

fn foo<T: Default>(t: T) {}

foo::<StructImplementingDefault>(_::default())

JoshBashed avatar Jun 12 '23 03:06 JoshBashed

oh never mind, right, we don't really need to reference the trait directly either way.

SOF3 avatar Jun 12 '23 03:06 SOF3

I've been putting off reading this RFC, and looking at the latest version, I can definitely feel like once the aesthetic arguments are put aside, the motivation isn't really there.

And honestly, it's a bit weird to me to realise how relatively okay I am with glob imports in Rust, considering how I often despise them in other languages like JavaScript. The main reason for this is that basically all of the tools in the Rust ecosystem directly interface with compiler internals one way or another, even if by reimplementing parts of the compiler in the case of rust-analyzer.

In the JS ecosystem, if you see a glob import, all hope is essentially lost. You can try and strip away all of the unreasonable ways of interfacing with names like eval but ultimately, unless you want to reimplement the module system yourself and do a lot of work, a person seeing a glob import knows as much as a machine reading it does. This isn't the case for Rust, and something like rust-analyzer will easily be able to tell what glob something is coming from.

So really, this is an aesthetic argument. And honestly… I don't think that importing everything by glob, or by name, is really that big a deal, especially with adequate tooling. Even renaming things.

Ultimately, I'm not super against this feature in principle. But I'm also not really sure if it's worth it. Rust's type inference is robust and I don't think it would run into technical issues, just… I don't really know if it's worth the effort.

clarfonthey avatar Jun 12 '23 06:06 clarfonthey

@clarfonthey glob imports easily have name collision when using multiple globs in the same module. And it is really common with names like Context. Plus, libraries providing preludes do not necessarily have the awareness that adding to the prelude breaks BC.

SOF3 avatar Jun 12 '23 08:06 SOF3

And honestly, it's a bit weird to me to realise how relatively okay I am with glob imports in Rust, considering how I often despise them in other languages like JavaScript. The main reason for this is that basically all of the tools in the Rust ecosystem directly interface with compiler internals one way or another, even if by reimplementing parts of the compiler in the case of rust-analyzer.

In the JS ecosystem, if you see a glob import, all hope is essentially lost. You can try and strip away all of the unreasonable ways of interfacing with names like eval but ultimately, unless you want to reimplement the module system yourself and do a lot of work, a person seeing a glob import knows as much as a machine reading it does. This isn't the case for Rust, and something like rust-analyzer will easily be able to tell what glob something is coming from.

I can understand your point, but, when using large libraries in conjunction, like @SOF3 said, it can be easy to run into name collisions. I use actix and seaorm and they often have simular type names.

JoshBashed avatar Jun 12 '23 16:06 JoshBashed

Personally _::new() and _::Variant "look wrong" to me although i cant tell why, I would expect <_>::new() and <_>::Variant to be the syntax, no suggestion on _ { ... } for struct exprs which tbh also looks wrong but <_> { ... } isnt better and we dont even support <MyType> { field: expr }

In my opinion, it's really annoying to type those set of keys. Using the QWERTY layout requires lots of hand movement. Additionally, it's syntax similar to what you mentioned has already been used to infer lifetimes, I am concerned people will confuse these. Frame 1

JoshBashed avatar Jun 12 '23 17:06 JoshBashed

Right, I should probably clarify my position--

I think that not liking globs is valid, but I also think that using globs is more viable in Rust than in other languages. Meaning, it's both easier to use globs successfully, and also easier to just import everything you need successfully. Rebinding is a bit harder, but still doable.

Since seeing how useful rust-analyzer is for lots of tasks, I've personally found that the best flows for these kinds of things involve a combination of auto-import and auto-complete. So, like mentioned, _ is probably a lot harder to type than the first letter or two of your type name plus whatever your auto-completion binding is (usually tab, but for me it's Ctrl-A).

Even if you're specifically scoping various types to modules since they conflict, that's still just the first letter of the module, autocomplete, two colons, the first letter of the type, autocomplete. Which may be more to type than _, but accomplishes the goal you need to accomplish.

My main opinion here is that _ as a type inference keyword seems… suited to a very niche set of aesthetics that I'm not sure is worth catering to. You don't want to glob-import, you don't want to have to type as much, but also auto-completing must be either too-much or not available. It's even not about brevity in some cases: for example, you mention cases where you're creating a struct inside a function which already has to be annotated with the type of the struct, which cannot be inferred, and therefore you're only really saving typing it once.

Like, I'm not convinced that this can't be better solved by improving APIs. Like, for example, you mentioned that types commonly in preludes for different crates used together often share names. I think that this is bad API design, personally, but maybe I'm just not getting it.

clarfonthey avatar Jun 12 '23 18:06 clarfonthey

I do think inferred types are useful when matching for brevity's sake: e.g. in a RV32I emulator:

#[derive(Copy, Clone, Default, Eq, PartialEq, Ord, PartialOrd, Debug, Hash)]
pub struct Reg(pub Option<NonZeroU8>);

#[derive(Debug)]
pub struct Regs {
    pub pc: u32,
    pub regs: [u32; 31],
}

impl Regs {
    pub fn reg(&self, reg: Reg) -> u32 {
        reg.0.map_or(0, |reg| self.regs[reg.get() - 1])
    }
    pub fn set_reg(&mut self, reg: Reg, value: u32) {
        if let Some(reg) = reg {
            self.regs[reg.get() - 1] = value;
        }
    }
}

#[derive(Debug)]
pub struct Memory {
    bytes: Box<[u8]>,
}

impl Memory {
    pub fn read_bytes<const N: usize>(&self, mut addr: u32) -> [u8; N] {
        let mut retval = [0u8; N];
        for v in &mut retval {
            *v = self.bytes[addr.try_into().unwrap()];
            addr = addr.wrapping_add(1);
        }
        retval
    }
    pub fn write_bytes<const N: usize>(&mut self, mut addr: u32, bytes: [u8; N]) {
        for v in bytes {
            self.bytes[addr.try_into().unwrap()] = v;
            addr = addr.wrapping_add(1);
        }
    }
}

pub fn run_one_insn(regs: &mut Regs, mem: &mut Memory) {
    let insn = Insn::decode(u32::from_le_bytes(mem.read_bytes(regs.pc))).unwrap();
    match insn {
        _::RType(_ { rd, rs1, rs2, rest: _::Add }) => {
            regs.set_reg(rd, regs.reg(rs1).wrapping_add(regs.reg(rs2)));
        }
        _::RType(_ { rd, rs1, rs2, rest: _::Sub }) => {
            regs.set_reg(rd, regs.reg(rs1).wrapping_sub(regs.reg(rs2)));
        }
        _::RType(_ { rd, rs1, rs2, rest: _::Sll }) => {
            regs.set_reg(rd, regs.reg(rs1).wrapping_shl(regs.reg(rs2)));
        }
        _::RType(_ { rd, rs1, rs2, rest: _::Slt }) => {
            regs.set_reg(rd, ((regs.reg(rs1) as i32) < regs.reg(rs2) as i32) as u32);
        }
        _::RType(_ { rd, rs1, rs2, rest: _::Sltu }) => {
            regs.set_reg(rd, (regs.reg(rs1) < regs.reg(rs2)) as u32);
        }
        // ...
        _::IType(_ { rd, rs1, imm, rest: _::Jalr }) => {
            let pc = regs.reg(rs1).wrapping_add(imm as u32) & !1;
            regs.set_reg(rd, regs.pc.wrapping_add(4));
            regs.pc = pc;
            return;
        }
        _::IType(_ { rd, rs1, imm, rest: _::Lb }) => {
            let [v] = mem.read_bytes(regs.reg(rs1).wrapping_add(imm as u32));
            regs.set_reg(rd, v as i8 as u32);
        }
        _::IType(_ { rd, rs1, imm, rest: _::Lh }) => {
            let v = mem.read_bytes(regs.reg(rs1).wrapping_add(imm as u32));
            regs.set_reg(rd, i16::from_le_bytes(v) as u32);
        }
        _::IType(_ { rd, rs1, imm, rest: _::Lw }) => {
            let v = mem.read_bytes(regs.reg(rs1).wrapping_add(imm as u32));
            regs.set_reg(rd, u32::from_le_bytes(v));
        }
        // ...
    }
    regs.pc = regs.pc.wrapping_add(4);
}

pub enum Insn {
    RType(RTypeInsn),
    IType(ITypeInsn),
    SType(STypeInsn),
    BType(BTypeInsn),
    UType(UTypeInsn),
    JType(JTypeInsn),
}

impl Insn {
    pub fn decode(v: u32) -> Option<Self> {
        // ...
    }
}

pub struct RTypeInsn {
    pub rd: Reg,
    pub rs1: Reg,
    pub rs2: Reg,
    pub rest: RTypeInsnRest,
}

pub enum RTypeInsnRest {
    Add,
    Sub,
    Sll,
    Slt,
    Sltu,
    Xor,
    Srl,
    Sra,
    Or,
    And,
}


pub struct ITypeInsn {
    pub rd: Reg,
    pub rs1: Reg,
    pub imm: i16,
    pub rest: ITypeInsnRest,
}

pub enum ITypeInsnRest {
    Jalr,
    Lb,
    Lh,
    Lw,
    Lbu,
    Lhu,
    Addi,
    Slti,
    Sltiu,
    Xori,
    Ori,
    Andi,
    Slli,
    Srli,
    Srai,
    Fence,
    FenceTso,
    Pause,
    Ecall,
    Ebreak,
}
// rest of enums ...

programmerjake avatar Jun 12 '23 20:06 programmerjake

I do like type inference for struct literals and enum variants.

However, type inference for associated functions doesn't make sense to me. Given this example:

fn expect_foo(_: Foo) {}
foo(_::bar());
  • According to this RFC, the _ should be resolved to Foo (the function argument's type), but this isn't always correct. I suspect that this behavior is often useful in practice, but there are cases where it will fail, and people may find this confusing. For example, Box::pin returns a Pin<Box<T>>, so _::pin(x) couldn't possibly be inferred correctly.

  • Even when Foo has a bar function that returns Foo, there could be another type that also has a matching bar function. Then _ would be inferred as Foo, even though it is actually ambiguous.

  • Another commenter suggested that we could allow method calls after the inferred type (e.g. _::new().expect("..."), or _::builder().arg(42).build()?). But this still wouldn't help in a lot of cases, because methods often return a different type than Self (in contrast to associated functions, where Self is indeed the most common return type).

    For example, the _ in _::new(s).canonicalize()? can't be inferred as Path, because Path::canonicalize returns Option<PathBuf>.

  • Another issue is that it doesn't support auto-deref (e.g. when a function expects a &str and we pass &_::new() [^1], which should be resolved as &String::new(), but that may be ambiguous).

All in all, it feels like this would add a lot of complexity and make the language less consistent and harder to learn.

[^1]: I realize this is a contrived example

Aloso avatar Jun 12 '23 21:06 Aloso

Regarding structs and enums: The RFC didn't explicitly mention this, but I think that tuple structs, tuple enum variants, and unit structs should also be inferrable:

enum MyEnum {
    NormalVariant { a: i32 },
    TupleVariant(i32),
    UnitVariant,
}

struct NormalStruct { a: i32 }
struct TupleStruct(i32);
struct UnitStruct;

fn expect_enum(_: MyEnum) {}
fn expect_normal_struct(_: NormalStruct) {}
fn expect_tuple_struct(_: TupleStruct) {}
fn expect_unit_struct(_: UnitStruct) {}

expect_enum(_::NormalVariant { a: 42 });
expect_enum(_::TupleVariant(42));
expect_enum(_::UnitVariant);

expect_normal_struct(_ { a: 42 });
expect_tuple_struct(_(42));
expect_unit_struct(_);

Aloso avatar Jun 12 '23 21:06 Aloso

I use a trait which contains a function that takes an argument of a type which is only accessed as an associated type, and being able to replace all of that with a _::Variant or similar would save so much visual clutter and typing. Should be easy to infer as the function only takes that specific type as an argument, just lots of typing (that doesn't autocomplete very well because angle brackets) to get to it.

knickish avatar Jun 12 '23 22:06 knickish

I do think inferred types are useful when matching for brevity's sake:

Just gonna break down my thought here when I read this:

  1. Oh, there are underscores. What should fill in the blank?
  2. It's returned by Insn::decode. What does that return?
  3. Oh, it's Insn. Cool.

I'm not sure what's gained by doing this instead of adding a single use Insn::*; statement before the match, or a use self::Insn::* statement across the board and just relying on that throughout a codebase. To me, you're both refusing to rely on full local inference and refusing to rely on full non-local inference, instead relying on this weird half version where you have to add a _:: but nothing else. What does this add, here?

clarfonthey avatar Jun 12 '23 23:06 clarfonthey

Even if you call it "Implied types", in reality you are asking for "inferred Path" or "inferred partial path".

VitWW avatar Jun 12 '23 23:06 VitWW

I'm not sure what's gained by doing this instead of adding a single use Insn::*; statement before the match, or a use self::Insn::* statement across the board and just relying on that throughout a codebase. To me, you're both refusing to rely on full local inference and refusing to rely on full non-local inference, instead relying on this weird half version where you have to add a _:: but nothing else. What does this add, here?

because you'd need waay more than a single use; if you did the use Enum::* trick you'd end up with something appreciably longer as well as having a huge number of symbols in scope that can potentially collide (for RV64GC or PowerISA you'd have dozens or hundreds of enumerants, and, for PowerISA particularly, I expect some name collisions).

The main thing using _::Foo instead of Foo does is it automatically selects the correct Foo based on deduced type rather than reporting an ambiguity error because there are multiple Foo in scope.

use Insn::*;
use RTypeInsnRest::*;
use ITypeInsnRest::*;
use STypeInsnRest::*;
use BTypeInsnRest::*;
use UTypeInsnRest::*;
use JTypeInsnRest::*;
match ... {
    RType(RTypeInsn { rd, rs1, rs2, rest: Add }) => ...
}

programmerjake avatar Jun 13 '23 00:06 programmerjake

  • According to this RFC, the _ should be resolved to Foo (the function argument's type), but this isn't always correct. I suspect that this behavior is often useful in practice, but there are cases where it will fail, and people may find this confusing. For example, Box::pin returns a Pin<Box<T>>, so _::pin(x) couldn't possibly be inferred correctly.
  • Even when Foo has a bar function that returns Foo, there could be another type that also has a matching bar function. Then _ would be inferred as Foo, even though it is actually ambiguous.
  • Another commenter suggested that we could allow method calls after the inferred type (e.g. _::new().expect("..."), or _::builder().arg(42).build()?). But this still wouldn't help in a lot of cases, because methods often return a different type than Self (in contrast to associated functions, where Self is indeed the most common return type). For example, the _ in _::new(s).canonicalize()? can't be inferred as Path, because Path::canonicalize returns Option<PathBuf>.
  • Another issue is that it doesn't support auto-deref (e.g. when a function expects a &str and we pass &_::new(), which should be resolved as &String::new(), but that may be ambiguous).

Sorry, I don't understand these points. Can you give some code examples?

JoshBashed avatar Jun 13 '23 00:06 JoshBashed

@JoshuaBrest not a fan of <_> since I think it's unnecessary, but it's not that hard to type on qwerty if you have a normal hand, because you can place all fingers on all the keys you need to press.

SOF3 avatar Jun 13 '23 03:06 SOF3

For example, the _ in _::new(s).canonicalize()? can't be inferred as Path, because Path::canonicalize returns Option<PathBuf>.

@Aloso I have covered this in my alternative definition above. _ always expands to the type expected at set B (i.e. the argument in the argument list), which is not ambiguous at all if you know this definition.

That said, I agree that it is unintuitive for a new user to get to learn that _ is the argument type instead of any arbitrary type that might work. And the Box::pin case is a great counterexample against associated function calls indeed.

SOF3 avatar Jun 13 '23 03:06 SOF3

In regards to the imports side of the equation, The worst thing I encounter in other languages like Golang is when I can't trace types and functions through imports. Rust is designed such that type inference stops at function boundaries, and likewise name inferences stop at scope boundaries. Glob imports are there for controlled cases (e.x. use super::* or use well_known_self_contained_stuff::* or use CommonEnum::*) where the maintainer and future maintainers could easily tell what is going on. In any other case (say we are using a wide diversity of functionalities from tokio) we should always use explicit imports so that we can find immediately whether we are using the standard library version of a struct with the same name or not. Specifically, we should be able to copy the name of the type from let ...: type = or type::... or type(...) and ctrl+f find it at the beginning of the scope or file. It's definitely worth the extra hassle.

The only example where I see this as being favorable is in the example of repetitively matching against a lot of enum variants and having a nested variety of structs, where the type being inferred is immediately previous to or enclosing the type in question. I've encountered a lot of places with an enum variant containing a struct that exists solely for the purpose of organizing data for that variant, and it's unambiguous and annoying to import and type out that struct everywhere.

A conservative way of implementing this RFC is to allow using _ { ... } and _::Var inferencing only when it is enclosed inside a normally typed out let destructuring or pattern destructuring. Or maybe we allow it for all patterns so it can be used in matches at the outer variant level. But definitely, I don't like seeing this new kind of inference be used for module or function level stuff, just use it for structuring.

AaronKutch avatar Jun 19 '23 03:06 AaronKutch

@AaronKutch Imports are not necessary right now either if you use something like Default::default().

SOF3 avatar Jun 19 '23 06:06 SOF3

I am currently traveling and am unable to review and respond affectively without my computer. Could I give someone write access?

JoshBashed avatar Jun 19 '23 06:06 JoshBashed