rust Tracking issue for `proc_macro::Span` inspection APIs

trafficstars

This issue is intended to track a number of unstable APIs which are used to inspect the contents of a Span for information like the file name, byte position, manufacturing new spans, combining them, etc.

This issue tracks the proc_macro_span unstable feature.

Public API

Already stabilized:

impl Span {
    pub fn source_text(&self) -> Option<String>;
}

impl Group {
    pub fn span_open(&self) -> Span;
    pub fn span_close(&self) -> Span;
}

To be stabilized, probably in their current form:

impl Span {
    pub fn line(&self) -> usize;
    pub fn column(&self) -> usize;

    pub fn start(&self) -> Span;
    pub fn end(&self) -> Span;
}

To be stabilized after some (re)design or discussion:

impl Span {
    pub fn source_file(&self) -> SourceFile;

    pub fn byte_range(&self) -> Range<usize>;
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct SourceFile { .. }

impl !Send for SourceFile {}
impl !Sync for SourceFile {}

impl SourceFile {
    pub fn path(&self) -> PathBuf;
    pub fn is_real(&self) -> bool;
}

Things that require more discussion:

impl Span {
    pub fn eq(&self, other: &Span) -> bool;
    pub fn join(&self, other: Span) -> Option<Span>;
    pub fn parent(&self) -> Option<Span>;
    pub fn source(&self) -> Span;
}

impl Literal {
    pub fn subspan<R: RangeBounds<usize>>(&self, range: R) -> Option<Span>;
}

Oct 01 '18 17:10 alexcrichton

I wanted to shed some pain this is causing with tarpaulin.

Tarpaulin has worked amazingly well for me and my company as a replacement for kcov which historically been a pain to get accurate, reliable and correct results. At the moment tarpaulin stands as the most promising goto option for codecoverage in rust and just feels more like the only first class tooling option.

Having one of those when choosing to invest in a technology is important for many company's adoption story for checking off code quantity checkmarks. When they see that rust doesn't have a reasonable code quality story that works on stable rust, that's can result in a "pass" rather than "I'll take it". There are currently some work arounds for making this work-ish on stable but it feels very much like the story serde was in a year or so ago when you wanted to show all your friends how amazing serde was but then was embarrassed to show what it took to make work on stable because of a macro stabilization blocker.

Oct 10 '18 19:10 softprops

With procedural macros having reached a point where they're very useful on stable, I expect many users will find themselves needing access to this information. Would it be reasonable to only stabilize parts of the Span API that are not too risky? Perhaps exposing a function that optionally returns the path of the file where a macro is invoked if such a file exists?

Oct 28 '18 13:10 JakeTherston

I have a concern about exposing LineColum information. It looks like it could play badly with incremental compilation, especially in the IDE context.

My understanding is that, if one adds a blank line to the start of the file, the line_column information of the input spans to proc macro changes. That means that IDE would have to re-expand procedural macros even after insignificant white space changes.

I would feel more confident if proc-macros were strictly a pure function from the input token stream to output token stream. This can be achieved, for example, by making line-column infocation relative to the start of macro invocation (as opposed to relative to the start of the file).

I don't feel that exposing absolute position is necessary the end of the world: IDE can track if a macro actually makes use of the info, or it can supply macro with some fake line/columns. But it's hard to tell if such hacks would work well in practice, and it's hard to experiment with them for the lack of IDE which handled proc-macros incrementally....

Feb 01 '19 13:02 matklad

If the parser allocated IDs to every AST node, then, and this is the hard part, when an edit was made to the source, the parser tried to keep those IDs the same in all the non-edited code and only allocate new IDs for new nodes, that would allow spans to be kept completely separate from the AST. Those IDs could be passed through macro expansion without causing unnecessary invalidations. If something needed a span later on, it could then go back and ask the parser for the span for that particular AST node ID. I feel like having an incremental parser is important, not because parsing is the bottleneck, but because it underpins everything else.

Feb 02 '19 03:02 davidlattimore

@davidlattimore this is fascinating, but slightly off-topic for the issue. I've created a thread on internals: https://internals.rust-lang.org/t/macros-vs-incremental-parsing/9323

Feb 02 '19 09:02 matklad

The column!() macro as well as std::panic::Location::column are returning 1-based columns while the span available from the proc-macro crate is 0-based according to its docs. Is this inconsistency intended?

May 30 '19 08:05 est31

This thread has more discussion about 1-based columns: https://github.com/rust-lang/rust/pull/46762#issuecomment-352474639

May 30 '19 08:05 est31

Another open question is how this API relates to https://github.com/rust-lang/rust/issues/47389 which is about minimizing span information throughout the compiler. Should stabilization be blocked until a design for #47389 is found? Is it too late already as we have set_span functionality? @michaelwoerister what do you think?

May 30 '19 22:05 est31

#47389 is mostly concerned about data types that are used later in the compilation pipeline, such as type information and MIR. Exposing things at the proc-macro level should not be too bad.

May 31 '19 08:05 michaelwoerister

But rust-analyzer might one day expand the scope of incremental compilation to the parsing stage, right?

Jun 01 '19 00:06 est31

That is already true for rust-analyzer, macro expansion stage runs incrementally. We do’t yet support line/column macro, but I think what we would do is that we’ll just expand them to a dummy value like 42. This probably won’t break macro (It could be this line indeed), but will allow us to avoid re-expanding macro after every keystroke

Jun 01 '19 05:06 matklad

What we could do is store for each macro invocation whether it has invoked any of the inspection APIs or not and then re-run the macro only if any of the span information it has requested has had any changes. Basically only adding the infos the requested spans to the hash (or only adding span info to the hash if there's been any request of span info). However, as this has to update state, it would make span inspection slower than it could be, especially if something inspects spans routinely. So not sure about the performance impacts of this.

Jun 18 '19 03:06 est31

@alexcrichton @matklad Any update on what blockers are for this issue? It doesn't seem like there should be anything holding this back.

Aug 19 '19 00:08 jhpratt

@jhpratt my question about 1-based vs 0-based is still not answered: https://github.com/rust-lang/rust/issues/54725#issuecomment-497246753

Aug 19 '19 00:08 est31

Hm, that's a fairly simple one. My intuition would certainly lean towards 1-based.

Aug 19 '19 00:08 jhpratt

Is this still blocked on questions with regard to the inconsistency between two base indices?

Dec 01 '19 21:12 syntacticsugarglider

I just thought I'd chime in here, cause I just ran into an issue where I would like to be able to map "place in code" to something else. Basically, I don't care about files or columns or indexes or anything, I just need something that's Eq + Hash, and that I can get 2 from (start and end) for a Span (also that it follows that any given token t has the same end as the start of the next token). This could just be a usize that's local to the current macro invocation and doesn't have to mean anything to the broader world, but it gives the ability to sort and map on "code points".

Dec 06 '19 20:12 Alxandr

I'd love to see this stabilized too. In particular, assert2 uses Span::source_text() to show the expression that caused an assertion failure:

On stable it falls back to regular stringification, but that certainly looks less pretty.

Jan 12 '20 16:01 de-vri-es

Note that Span::parent allows you to write 'completely unhygenic' macros, which can see identifiers in a way that would otherwise be impossible:

// src/lib.rs
#![feature(proc_macro_span)]

use proc_macro::TokenStream;

#[proc_macro]
pub fn no_hygiene(tokens: TokenStream) -> TokenStream {
    tokens.into_iter().map(|mut tree| {
        let mut span = tree.span();
        while let Some(parent) = span.parent() {
            span = parent;
        }
        tree.set_span(tree.span().resolved_at(span));
        tree
    }).collect()
}

use no_hygiene::no_hygiene;

macro_rules! weird {
    () => {
        println!("{}", no_hygiene!(my_ident));
    }
}

fn main() {
    let my_ident = "Hello";
    weird!();
}

Here, weird!() is able to access my_ident via no_hygiene!. This is already possible with just Span::resolved_at in some circumstances - if a single token has a Span with the desired hygiene, a proc-macro can construct arbitrary tokens with the desired hygiene. However, Span::parent allows hygiene to be completely bypassed - a proc-macro can see into a particular scope that it would otherwise be completely isolated from.

Using Span::parent may be useful for the purposes of emitting error messages, but I'm worried about macros generating working code that intentionally uses this. This seems like an easy way to end up with code that has confusing non-local effects.

Jun 07 '20 07:06 Aaron1011

We might want to gate 'read-only' APIs (e.g. Span::source_text, Span::start) under a separate feature, since they could be stabilized without increasing the power of proc macros w.r.t hygiene.

Jun 07 '20 07:06 Aaron1011

We might want to gate 'read-only' APIs (e.g. Span::source_text, Span::start) under a separate feature, since they could be stabilized without increasing the power of proc macros w.r.t hygiene.

Came here to advocate for just this. I have a use case where I require access to the start/end line/column information of spans. And it would indeed be nice if they could be decoupled from the wider feature for an easier path towards stabilization.

Jun 10 '20 15:06 udoprog

So I'm looking into what's needed to get the following stabilized in libproc_macro:

Span::start
Span::end
LineColumn

There are two outlined concerns that I've been able to digest so far. If someone has more, I'd love to hear them.

This question by @est31 whether the column should be switched to be 1-based, instead of 0-based to be consistent with how other APIs report columns.
This concern raised by @matklad about exposing absolute span information in general, since they could have an effect on the output of a macro.

1-based columns

The corresponding functions and LineColumn structure in proc_macro2 are already marked as stable with the same 0-column specification. But as pointed out in alexcrichton/proc-macro2#237, while they are available they only supply dummy values on stable right now. But if desired, I'd be happy to put in the work to modify the reported column information to be 1-based instead. I personally don't have a strong opinion either way. The manner in which I've used spans in genco (and why I now have a desire for them to be stabilized :)) only cares about relative spans.

reported spans affecting the output of proc-macros

Given that proc-macros today can execute arbitrary code, and this seems like something we just have to live with. I don't know how beneficial limiting the existing proc_macro API surface with the goal of making macro execution as reproducible as possible is at this point. I personally wouldn't want to block on this. But for my specific use-case, changing the spans to be reported in a manner which is local to the macros themselves would also give me what I need. So if desired, I'd be for implementing this change.

To clarify and IIUC, that means this assuming 0-based columns:

/// A comment
macro! { hello // start {column: 8, line: 2}
    world // start {column: 4, line: 3}
};

Would instead be reported in the proc-macro as this:

/// A comment
macro! { hello // start {column: 1, line: 1}
    world // start {column: 4, line: 2}
};

Jun 21 '20 15:06 udoprog

reported spans affecting the output of proc-macros I don't know how beneficial limiting the existing proc_macro API surface with the goal of making macro execution as reproducible as possible is at this point.

I don't think there are any concerns about reproducibility, there are concerns about incremental compilation which is something completely different. I think its important that we get opinion of some incremental compilation expert from t-compiler about this (cc @wesleywiser? not really sure whom to CC).

Are we ok with exposing absolute position information to macros? AFAIUI, that means that either:

we'll have to re-expand all proc-macros even for trivial changes like addition of blank line.
we'll have to record "has this macro looked at line/column?" bit of info, and still re-expand those specific macros for trivial changes.

Jun 21 '20 15:06 matklad

I don't think there are any concerns about reproducibility, there are concerns about incremental compilation which is something completely different.

So when I phrase it as the output of a macro being reproducible, that is what I'm referring to. Given a collection of inputs it would produce an identical output which would allow it to be cached during incremental compilation.

But to my wider point, a proc-macro can generate random numbers, access the filesystem, or connect to a database (the example I linked to). I'd be very curious to hear about plans for incremental compilation, but given these facts I have a hard time envisioning an approach that allows us to make decisions about re-expanding beyond some form of well understood best-effort heuristic.

Jun 21 '20 15:06 udoprog

But to my wider point, a proc-macro can generate random numbers, access the filesystem, or connect to a database (the example I linked to).

That's possible for now, but it's not a documented feature that proc macros are unsandboxed and it's likely that wasm based macros won't get that capability.

To the general question of incremental compilation see also https://github.com/rust-lang/rust/issues/54725#issuecomment-502932263

Jun 21 '20 17:06 est31

@matklad While I do know a bit about the incremental system, I wouldn't consider myself an expert. I believe @pnkfelix knows a fair bit more than I.

Jun 21 '20 23:06 wesleywiser

I'm specifically interested in Span::join that would allow for painless proc-macro error messages on stable. There's the hack that makes it already possible, but in order to exploit it, one has to carry around two spans: the span to the first token and the span to the last token.

If join were available, syn could do the joining in its ast.span() methods and the two-spans trick would become easier to pull off since the joined span had already covered the whole range.

Jun 24 '20 19:06 CreepySkeleton

I'm specifically interested in Span::join that would allow for painless proc-macro error messages on stable

Before Span::join is stabilized, we should decide how it should handle hygiene. If I write first.join(second) should the returned Span use Span::resolved_at(first) or Span::resolved_at(second)? Or should we change the return type to Result<Span, JoinErr>, and have enum JoinError { DifferentFile, DifferentHygiene }?

Jun 24 '20 21:06 Aaron1011

@Aaron1011

Note that Span::parent allows you to write 'completely unhygenic' macros, which can see identifiers in a way that would otherwise be impossible:

Not to advocate in either direction, but this is already possible today, without Span::parent():

// main.rs
mod submod {
    macro_rules! make_local_weird {
        ($local:item) => (
            #[macro_export]
            macro_rules! weird {
                () => (
                    println!("{}", export! { $local private })
                )
            }

            pub use weird;
        )
    }

    make_local_weird!(fn f() {});
}

pub fn main() {
    let private = "hi";
    submod::weird!();
}

// lib.rs
use proc_macro::TokenStream;

use syn::spanned::Spanned;
use syn::{parse_macro_input, Result};
use syn::parse::{Parse, ParseStream};
use quote::quote;

struct Export {
    item: syn::Item,
    ident: syn::Ident,
}

impl Parse for Export {
    fn parse(input: ParseStream) -> Result<Self> {
        Ok(Export {
            item: input.parse()?,
            ident: input.parse()?,
        })
    }
}

#[proc_macro]
pub fn export(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input as Export);
    let mut ident = input.ident;
    let span = ident.span().resolved_at(input.item.span());
    ident.set_span(span.into());
    quote!(#ident).into()
}

Aug 02 '20 01:08 SergioBenitez

Are we ok with exposing absolute position information to macros? AFAIUI, that means that either:

we'll have to re-expand all proc-macros even for trivial changes like addition of blank line.

we'll have to record "has this macro looked at line/column?" bit of info, and still re-expand those specific macros for trivial changes.

A third option would be to provide dummy information to proc macros from rust-analyzer. AFAIK, most proc macros use the data they obtain from these APIs only at runtime, and do not really make any decisions based on it at expansion time (and even then, they should work fine if line/column is always 1, unless they're being actively annoying).

Dec 01 '20 00:12 jonas-schievink

rust rust copied to clipboard

Tracking issue for `proc_macro::Span` inspection APIs

Public API

1-based columns

reported spans affecting the output of proc-macros

rust
rust copied to clipboard