rust icon indicating copy to clipboard operation
rust copied to clipboard

Tracking Issue for `large_assignments` lint

Open oli-obk opened this issue 4 years ago • 17 comments

This is a tracking issue for the implementation of the MCP https://github.com/rust-lang/compiler-team/issues/420 The feature gate for the issue is #![feature(large_assignments)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

  • [x] Implement the RFC (https://github.com/rust-lang/rust/pull/83519)
  • [x] make sure we inform the user properly about the fact that they can change the limit https://github.com/rust-lang/rust/labels/E-easy
    • this just needs an additional note on the diagnostic informing the user about the current limit and how to change it.
  • [ ] reduce the noisiness (it's reported multiple times on different parts of an expression, but should only trigger on the innermost/first one) https://github.com/rust-lang/rust/labels/E-easy
    • this can be done by adding the current span to a Vec on self in https://github.com/oli-obk/rust/blob/85b1c67910dbeede8aabb16bc0c1c09a2fefe8ab/compiler/rustc_mir/src/monomorphize/collector.rs#L784
    • and then, right before https://github.com/oli-obk/rust/blob/85b1c67910dbeede8aabb16bc0c1c09a2fefe8ab/compiler/rustc_mir/src/monomorphize/collector.rs#L760 checking if the current span overlaps with any span in the vec.
  • [ ] stop linting in box/arc initialization https://github.com/rust-lang/rust/labels/E-medium
  • [ ] enable the lint by default with a 4k threshold
  • [ ] disable the lint in tests (by default, users can still turn it on) https://github.com/rust-lang/rust/labels/E-medium
  • [ ] Adjust documentation (see instructions on rustc-dev-guide)
  • [ ] Stabilization PR (see instructions on rustc-dev-guide)

Unresolved Questions

  • Should this threshold be configurable via the command line?
  • Should we split lint into multiple size limits lints?

Implementation history

  • https://github.com/rust-lang/rust/pull/83519

oli-obk avatar Mar 26 '21 16:03 oli-obk

This seems an useful lint. But is this lint firing when there's a copy larger than X bytes in the source code, or when the compiler optimization passes actually leave a X bytes copy in the binary? In the second is much more useful because modern compilers are supposed to optimize away many copies (see RVO, etc).

leonardo-m avatar Mar 26 '21 19:03 leonardo-m

Has anyone started working on the implementation yet?

osa1 avatar Mar 27 '21 08:03 osa1

The implementation is in #83519

oli-obk avatar Mar 29 '21 08:03 oli-obk

or when the compiler optimization passes actually leave a X bytes copy in the binary? In the second is much more useful because modern compilers are supposed to optimize away many copies (see RVO, etc).

It is indeed the second part. We do this doing collection, so after monomorphization, but before LLVM has a chance to do anything. Though I don't believe LLVM does anything useful here.

oli-obk avatar Mar 29 '21 08:03 oli-obk

This came up in today's @rust-lang/lang meeting. One notable consideration: should we enable this by default, and with what threshold?

joshtriplett avatar Mar 16 '22 17:03 joshtriplett

We don't really have a precedent for heuristic lints. The closest equivalent are the other raisable limits. I don't think we can use crater to find good limits, as most large types will be occuring in binary crates or tests on edge cases.

So... I think we should pick an arbitrary limit (e.g. 4kb) and only enable it by default if cfg(test) is not set and just wait for reports to come in. It's easily circumvented (just change the limit)

So: things to do before enabling:

  • we should make sure we inform the user properly about the fact that they can change the limit
  • we should disable the lint in tests (by default, users can still turn it on)
  • should look into reducing the noisiness (it's reported multiple times on different parts of an expression, but should only trigger on the innermost/first one)

oli-obk avatar Mar 16 '22 19:03 oli-obk

Sounds reasonable to me. And yes, I think the most obvious thresholds would be something like 1k or 4k.

joshtriplett avatar Mar 16 '22 22:03 joshtriplett

One thing I considered: multiple lints for different size thresholds? Probably not so nice.

nikomatsakis avatar Mar 18 '22 19:03 nikomatsakis

Multiple lints could be nice for having a couple of different thresholds, with correspondingly-different default lint levels.

As a strawman, thinking only about x64,

  • huge_assignments, deny-by-default, for (64 KiB, ∞)
  • large_assignments, warn-by-default, for (2 KiB, 64 KiB]
  • medium_assignments, allow-by-default, for (64 B, 2 KiB]

But absolutely my biggest uncertainty is how to specify the threshold, and how to scope it.

Perhaps the defaults should be something that the target definition gets to pick? I assume msp430-none-elf wants something very different from x86_64_v3-unknown-linux-gnu.

scottmcm avatar Mar 18 '22 20:03 scottmcm

My general experience from testing the lint with limit starting from around a few kilobytes, is that it is very likely to lint about copies of large structs. This is actionable, because one can wrap the struct in a box or an arc, but leads to a situation where the initialization of a box or an arc will still be linted about, which is unfortunately generally unactionable currently

tmiasko avatar Mar 18 '22 21:03 tmiasko

I think, for a first pass, we should just have a single lint, and set the threshold high enough that people won't typically encounter it (e.g. 1k). If we find that people want the threshold lower, and want to have different behavior at different thresholds, we could talk about having multiple lints with different thresholds. But in practice I'd expect people to lower or raise the threshold, but not need multiple thresholds.

joshtriplett avatar Mar 21 '22 22:03 joshtriplett

My general experience from testing the lint with limit starting from around a few kilobytes, is that it is very likely to lint about copies of large structs. This is actionable, because one can wrap the struct in a box or an arc, but leads to a situation where the initialization of a box or an arc will still be linted about, which is unfortunately generally unactionable currently

This is interesting. We may want to have certain "exemptions" from the lint.

nikomatsakis avatar Mar 22 '22 15:03 nikomatsakis

Is the second step currently being worked on ? Who is the mentor for this issue ? (I feel like trying a medium step)

b-NC avatar Mar 28 '22 22:03 b-NC

I am mentoring this issue. Don't hesitate to ping me on zulip about any questions or requests that you may have

oli-obk avatar Mar 29 '22 07:03 oli-obk

The second step might need to get ticked since #95478 has been merged

b-NC avatar Apr 02 '22 06:04 b-NC

A couple thoughts that came up during RustConf today:

  1. One particularly insidious case of these large moves is when its some local variable that is carried over a .await. Its an anti-pattern to have such local variables that end up stored in the generated Future, and we could do more to guide developers to avoid them. (Whether that guidance should be incorporated into the large_assignments lint, potentially as a special case that detects the assignment into a future, or if it should instead be part of a different lint entirely, I am not certain. For now it seems natural to me to correlated it with this lint.)
  2. It might be nice to provide heuristic advice as to simple API changes that could be made to address the problem. For example, replacing a [T; N] array with a Vec<T> (and likewise convert the initialization expression from [<expr>; N] to vec![<expr>; N]) is a pretty simple way to address the problem, assuming that the cost of the copying far exceeds the cost of the boxing performed by the Vec<T>.

pnkfelix avatar Aug 05 '22 20:08 pnkfelix

I think the await case should be a separate lint, but I do think we should have it.

joshtriplett avatar Aug 05 '22 21:08 joshtriplett

As implemented, #![deny(large_assignments)] relies on a flag being passed to rustc or on #![move_size_limit = N] being set at the crate level. Would it make sense to instead make move_size_limit an attribute accepted on any scope, and the lint having to walk up from every fn item upwards to see if the limit is set? (Impl details: You'd likely instead want to have some preprocessed DefId map to avoid O(n^2) search behavior, but the user experience is that of searching up.)

Is there any use-case where a different granularity would be desired? This would already be much more precise than the different lints with different thresholds proposal (which I also endorse as a potential shorter term solution).

  1. One particularly insidious case of these large moves is when its some local variable that is carried over a .await. Its an anti-pattern to have such local variables that end up stored in the generated Future, and we could do more to guide developers to avoid them. (Whether that guidance should be incorporated into the large_assignments lint, potentially as a special case that detects the assignment into a future, or if it should instead be part of a different lint entirely, I am not certain. For now it seems natural to me to correlated it with this lint.)
  2. It might be nice to provide heuristic advice as to simple API changes that could be made to address the problem. For example, replacing a [T; N] array with a Vec<T> (and likewise convert the initialization expression from [; N] to vec![; N]) is a pretty simple way to address the problem, assuming that the cost of the copying far exceeds the cost of the boxing performed by the Vec<T>.

I believe these cases can definitely be handled with targeted structured suggestions. It wouldn't help for user types (if they provide Huge on the stack and HugeBuf on the heap), but handling the std and some tokio types can be done.

I think the await case should be a separate lint, but I do think we should have it.

If so, should it also rely on move_size_limit?

estebank avatar Oct 03 '22 21:10 estebank

This seems like a good issue for me to work on a bit more long term, so I'll go ahead and claim this one.

@rustbot claim

Enselic avatar Aug 13 '23 07:08 Enselic

Regarding the implementation, I am slightly concerned that this lint has logic that is entangled with compiler/rustc_monomorphize/src/collector.rs. That file does important and subtle things; is there any way we can avoid mixing that up with the lint logic?

RalfJung avatar Oct 07 '23 23:10 RalfJung