libs-team
                                
                                 libs-team copied to clipboard
                                
                                    libs-team copied to clipboard
                            
                            
                            
                        ACP: Expose rustc_lexer::unescape Functionality in the proc_macro Crate for Standardized Literal Parsing
Proposal
Problem Statement
Currently, proc-macros that handle string literals receive raw strings with escape sequences and surrounding quotes. For example:
#[my_macro]
#[my_attr("\u{x78} blabla")]
pub struct B;
In the my_attr proc-macro, the received value is "\u{x78} blabla", including escape sequences and quotes, instead of the parsed equivalent ("x blabla"). This makes working with string literals cumbersome, as proc-macro authors need to reimplement unescape logic that already exists within the Rust compiler.
Motivating Examples or Use Cases
- 
Simplifying synLibrary: Libraries likesynneed to manually reimplement string literal unescaping. Having theunescapefunctionality available in theproc_macrocrate would allowsyn::LitStr::value()to use the standardized unescape function directly, leading to simplified and more reliable code.
- 
Consistency Across Tools: The Rust compiler already provides unescape functionality in rustc_lexer::unescape. Making this available publicly would ensure that tools and proc-macros handle escape sequences consistently.
- 
Reducing Code Duplication: Many proc-macro authors currently need to implement their own logic to handle escape sequences, resulting in duplicated code and potential inconsistencies. Exposing the compiler's unescape functionality would reduce redundancy. 
Solution Sketch
- 
Expose Unescape Functionality in proc_macroCrate: The unescape functionality fromrustc_lexer::unescapeshould be exposed in theproc_macrocrate, making it accessible for use in proc-macros.
- 
Public API for Literal Processing: A new API can be added to the proc_macrocrate that allows developers to parse and unescape string literals in an ergonomic and standardized way. This would significantly simplify the process of handling string literals in attributes and proc-macros.
Alternatives
- 
Reimplement in Libraries: The current approach is for libraries like synto reimplement the unescape logic. This is not ideal due to code duplication, maintenance burdens, and the potential for inconsistencies.
- 
External Crate: Instead of adding the unescape functionality to the proc_macrocrate, another option would be to create an external crate. However, considering that this functionality is tied to parsing Rust literals, adding it to the standard library seems more suitable.
- 
Leave as Is: Another alternative is to continue requiring proc-macro authors to implement their own unescape logic. However, this is not desirable due to the associated complexity and inconsistency. 
Additional Considerations
- 
Extend to All Literals: Extending this unescape functionality to all literal types, such as C-strings, integers, and floats, would improve consistency across different literal types and make parsing easier for proc-macro authors working with diverse literals. 
- 
Refactoring to Work Outside Compiler: The proc_macrocrate is being refactored to work even when run outside of the compiler. Therefore, the unescape functionality should be implemented in a way that does not depend on the compiler being available. This means making the unescape logic sufficiently library-agnostic so it can be used independently of the compiler context.
- 
Library-First Approach: The unescape function can likely be developed in a library-agnostic way to avoid code duplication. This suggests an opportunity to make the unescape function reusable, without relying on tight coupling with compiler internals, and making it broadly available.