rust-bindgen icon indicating copy to clipboard operation
rust-bindgen copied to clipboard

Functional C macros are not expanded.

Open nicokoch opened this issue 7 years ago • 27 comments

So I'm generating bindings to <linux/uinput>. This works pretty well, but two c macros in particular do not produce any bindings in the output. Reduced input below:

Input C/C++ Header

#define UI_DEV_CREATE		_IO(UINPUT_IOCTL_BASE, 1)
#define UI_DEV_DESTROY		_IO(UINPUT_IOCTL_BASE, 2)

Bindgen Invocation

let bindings = bindgen::Builder::default()
        .no_unstable_rust()
        .header("wrapper/wrapper.h")
        .expect("Unable to generate bindings").

wrapper.h looks like this:

#include <linux/uinput.h>

Actual Results

No constant UI_DEV_CREATE or UI_DEV_DESTROY generated in output.

Expected Results

Constants UI_DEV_CREATE and UI_DEV_DESTROY generated in output.

I'm not really that familiar with the kernel macro _IO and not sure if this is technically even possible, but I figured to report just to get a more professional opinion. If it's not possible, workarounds welcome.

nicokoch avatar Jun 16 '17 00:06 nicokoch

How is _IO defined? Grepping define _IO in my /usr/include/linux dir didn't yield anything.

emilio avatar Jun 16 '17 00:06 emilio

As in... If we don't know how _IO is defined, how could we generate a sensible constant?

emilio avatar Jun 16 '17 00:06 emilio

That being said, even with that it doesn't work (huh, I'd swear we knew how to deal with macro expansions):

#define _IO(a_) (a_)

#define UI_DEV_CREATE _IO(1)
#define UI_DEV_DESTROY _IO(2)

emilio avatar Jun 16 '17 00:06 emilio

This is probably a bug for rust-cexpr. Indeed, it's a dupe of https://github.com/jethrogb/rust-cexpr/issues/3.

emilio avatar Jun 16 '17 00:06 emilio

I guess I can try to add to libclang a way to get whether the macro definition belongs to a functional macro...

emilio avatar Jun 16 '17 00:06 emilio

If it helps any: _IO is defined in uapi/asm-generic/ioctl.h as: #define _IO(type,nr) _IOC(_IOC_NONE,(type),(nr),0) and _IOC in turn is defined as:

#define _IOC(dir,type,nr,size) \
      (((dir)  << _IOC_DIRSHIFT) | \
      ((type) << _IOC_TYPESHIFT) | \
      ((nr)   << _IOC_NRSHIFT) | \
      ((size) << _IOC_SIZESHIFT))

nicokoch avatar Jun 16 '17 00:06 nicokoch

Ok, so libclang provides a clang_Cursor_isMacroFunctionLike API. Perhaps we could use that.

emilio avatar Jun 16 '17 00:06 emilio

I've asked @jethrogb about what the best API for this would be. It shouldn't be a big deal to support it in bindgen once cexpr support is there.

emilio avatar Jun 16 '17 00:06 emilio

Meanwhile, I used the following in my wrapper.h to workaround the issue:

#include <linux/uinput.h>

const __u64 _UI_DEV_CREATE = UI_DEV_CREATE;
#undef UI_DEV_CREATE
const __u64 UI_DEV_CREATE = _UI_DEV_CREATE;

const __u64 _UI_DEV_DESTROY = UI_DEV_DESTROY;
#undef UI_DEV_DESTROY
const __u64 UI_DEV_DESTROY = _UI_DEV_DESTROY;

nicokoch avatar Jun 16 '17 00:06 nicokoch

Yeah, note that that would only work in libclang 3.9+ though.

emilio avatar Jun 16 '17 01:06 emilio

Current state: C preprocessor needed. @emilio can you add "help wanted" label?

jethrogb avatar Jul 26 '17 23:07 jethrogb

How about the progress of this? I'm generating the PostgreSQL header files using bindgen, but in the output files there're a lot of macros being lost.

clia avatar Oct 22 '18 00:10 clia

@clia The state is unchanged as of my latest comment https://github.com/rust-lang-nursery/rust-bindgen/issues/753#issuecomment-318214311. This issue is marked "help wanted" and waiting for someone (anyone) to implement this functionality.

jethrogb avatar Oct 22 '18 15:10 jethrogb

Another workaround preserving original names:

wrapper.h
#include <linux/ioctl.h>
#include <linux/usbdevice_fs.h>

#define MARK_FIX_753(req_name) const unsigned long int Fix753_##req_name = req_name;

MARK_FIX_753(USBDEVFS_CONTROL);
MARK_FIX_753(USBDEVFS_BULK);
build.rs
...
#[derive(Debug)]
pub struct Fix753 {}
impl bindgen::callbacks::ParseCallbacks for Fix753 {
    fn item_name(&self, original_item_name: &str) -> Option<String> {
        Some(original_item_name.trim_start_matches("Fix753_").to_owned())
    }
}

fn main() {
    let bindings = bindgen::Builder::default()
        .header("wrapper.h")
        .parse_callbacks(Box::new(Fix753 {}))
        .generate()
        .expect("Unable to generate bindings");
    ...
}
output bindings.rs
...
pub const USBDEVFS_CONTROL: ::std::os::raw::c_ulong = 3222820096;
pub const USBDEVFS_BULK: ::std::os::raw::c_ulong = 3222820098;

Seems like this workaround shouldn't conflict with future implementation.

iDawer avatar Feb 01 '19 20:02 iDawer

Not sure if this is the same problem, but I'm having issues getting at RTLD_DEFAULT from dlfcn.h.

It's defined thusly:

# define RTLD_DEFAULT   ((void *) 0)

Without hacks, it's totally absent from bindings.rs. Not sure why.

I've tried a few workarounds, including @iDawer's

wrapper.h:

#define _GNU_SOURCE                                                                                
#define __USE_GNU                                                                                  
#include <dlfcn.h>                                                                                 
#include <link.h>                                                                                  
                                                                                                                                                                                                                                        
// Import using another name, doesn't work.                                      
//const void *MY_RTLD_DEFAULT = RTLD_DEFAULT;                                                      
                                                                                                     
#define MARK_FIX_753(req_name) const void *Fix753_##req_name = req_name;                           
MARK_FIX_753(RTLD_DEFAULT);                                                                        
                                                                                                     
// Direct definition, works.                                                                       
//#define MY_RTLD_DEFAULT 0

build.rs:

extern crate bindgen;                                                                                
                                                                                                     
use std::{env, path::PathBuf};                                                                       
                                                                                                     
const BINDINGS_FILE: &'static str = "bindings.rs";                                                   
const WRAPPER_HEADER: &'static str = "wrapper.h";                                                    
                                                                                                     
#[derive(Debug)]                                                                                     
pub struct Fix753 {}                                                                                 
impl bindgen::callbacks::ParseCallbacks for Fix753 {                                                 
    fn item_name(&self, original_item_name: &str) -> Option<String> {                                
        Some(original_item_name.trim_start_matches("Fix753_").to_owned())                            
    }                                                                                                
}                                                                                                    
                                                                                                     
fn main() {                                                                                          
    // Rust target spec is needed for now so that auto-generated tests pass.                         
    // https://github.com/rust-lang-nursery/rust-bindgen/issues/1370#issuecomment-426597356          
    let bindings = bindgen::Builder::default()                                                     
        .header(WRAPPER_HEADER)                                                                    
        //.rust_target(bindgen::RustTarget::Stable_1_26)                                           
        .parse_callbacks(Box::new(Fix753 {}))                                                      
        .generate()                                                                                
        .expect("bindgen failed");                                                                 
                                                                                                   
    let out_path = PathBuf::from(env::var("OUT_DIR").unwrap());                                    
    bindings                                                                                       
        .write_to_file(out_path.join(BINDINGS_FILE))                                               
        .expect("Couldn't write bindings!");                                                       
}

bindings.rs:

extern "C" {                                                                                            
    #[link_name = "\u{1}Fix753_RTLD_DEFAULT"]                                                           
    pub static mut RTLD_DEFAULT: *const ::std::os::raw::c_void;                                         
}

Notice the value is not expanded. It's referenced as an extern :\

Needless to say, trying to use RTLD_DEFAULT gives a linker error:

...
  = note: /usr/bin/ld: /home/vext01/research/yorick/phdrs/target/debug/deps/phdrs-51a0fdcb223a7a9a.1sf375tf7jcp7kb.rcgu.o: in function `phdrs::find_symbol':
          /home/vext01/research/yorick/phdrs/src/lib.rs:(.text._ZN5phdrs11find_symbol17h0db50af030d9c91cE+0x6c): undefined reference to `Fix753_RTLD_DEFAULT'                                                    
          collect2: error: ld returned 1 exit status

Questions:

  • Is RTLD_DEFAULT not expanded due to the problems discussed earlier in this thread?
  • Can anyone think of a workaround?

This is bindgen-0.52.

vext01 avatar Apr 03 '20 16:04 vext01

I encountered both of these problems, which I believe to be two different ones, neither

#define SNDRV_PCM_STATE_OPEN ((snd_pcm_state_t) 0)

nor

#define SNDRV_CTL_IOCTL_ELEM_LIST	_IOWR('U', 0x10, struct snd_ctl_elem_list)

...shows up in the output at all. Errors from cexpr are swallowed by bindgen and results in no output, and that's what happening here. Anyhow, what I ended up with as a workaround was a regex to cover both these cases, as can be seen here.

The regexes are r"#define\s+(\w+)\s+\(\((\w+)\)\s*(\d+)\)" and r"#define\s+(\w+)\s+_IO([WR]*)\(([^,)]+),\s*([^,)]+),?\s*([^,)]*)\)" and the code handling these are good enough for me, but there might be special cases for other headers that it does not handle.

diwic avatar Feb 21 '21 16:02 diwic

@jethrogb @emilio I now have two projects that I'm a part of that depend on this so I'd willing to help. Where do you want me to start? Does the preprocessor need to be added first at the cexpr level?

jbaublitz avatar Jul 25 '22 14:07 jbaublitz

Any updates to this issue?

I'm trying to create rust bindings for raylib, but bindgen isn't generating rust bindings for the color macros:

/* raylib.h */

/* ... */

/* These aren't being generated in the bindgen output. */
#define WHITE      CLITERAL(Color){ 255, 255, 255, 255 }   // White
#define BLACK      CLITERAL(Color){ 0, 0, 0, 255 }         // Black
#define BLANK      CLITERAL(Color){ 0, 0, 0, 0 }           // Blank (Transparent)
#define MAGENTA    CLITERAL(Color){ 255, 0, 255, 255 }     // Magenta
#define RAYWHITE   CLITERAL(Color){ 245, 245, 245, 255 }   // My own White (raylib logo)

/* ... */

eliaxelang007 avatar Apr 02 '23 06:04 eliaxelang007

Is there a possible solution in the near feature?

I'm trying to convert macros for pin definitions of the harmony framework

#define MyGPIO_Get()               (((PORT_REGS->GROUP[2].PORT_IN >> 0U)) & 0x01U)
#define MyGPIO_PIN                  PORT_PIN_PC00

pustekuchen91 avatar Aug 11 '23 10:08 pustekuchen91

@emilio I'd like to propose a temporary and longer term solution that will take a while due to it requiring LLVM.

In the short term, @ojeda found a workaround with clang where we should be able to use temporary intermediate files with the macros surrounded by parentheses to evaluate these sorts of expressions to a constant. This may slow down compile time, but it should have the benefit of allowing this particular case to be handled using current clang.

In the longer term, @ojeda talked with the clang maintainers and they are open to having someone add functionality to LLVM and clang to essentially keep C preprocessor macro information available for a given parsed file. As you probably already know, currently, a lot of the cpp information is dropped by the time you've parsed the file. The clang maintainer is open to changes that would effectively allow using a header file as a context from which you could evaluate arbitrary macro expressions as I understand it (correct me if I'm wrong, @ojeda, since I wasn't part of this conversation). This would provide bindgen with the ability to have much more robust macro evaluate through clang.

For now, would you be willing to accept the shorter term solution if I put up a PR and would you be interested in something like the longer term solution later on?

jbaublitz avatar Aug 11 '23 13:08 jbaublitz

@jbaublitz could you expand a bit on how this workaround with intermediate files would work?

pvdrz avatar Aug 14 '23 19:08 pvdrz

@pvdrz Sure! Just as a heads up, I think @ojeda and I have come up with two similar but slightly different solutions so we're emailing back and forth to get on the same page. However, I've had success in our test case around macros in cryptsetup that use function-like macros in their expansion to define constants using my solution.

I've looked at the bindgen code and I feel like we kind of have two routes here. Currently, it looks like we have cexpr doing a lot of the heavy lifting in the case of parsing #defines. I'll outline the two possibilities I see with my solution and the tradeoffs and hopefully one of them will be acceptable for bindgen while we work on the longer term work of LLVM API changes.

The first option is a more localized change and could possibly be used as a backup for cexpr if it fails even, but involves more temporary files and therefore potential impacts to compile time for a large number of function-like macro definitions. In this case, if cexpr fails to parse the expansion of the macro (for example if it uses function-like macros in the expansion), it is possible to write out a temporary file containing a use of the macro. Clang can read in this temporary file and parse the macro invocation and provide an integer literal as the output in cases like the one I liked above. The drawback is that there would like need to be a temporary file for each failure to parse the macro expansion. This may not be acceptable but I wanted to put it out there as an option.

The other option would be a larger change but would only require one temporary file per header being passed to bindgen. I've had success with the clang crate generating a list of macro definitions in the header file, putting them all in a temporary file, and then evaluating each one, mapping the evaluation to the original macro definition name, returning all of that information in a HashMap. I'm not entirely sure how this would work with cexpr, but it is definitely the more performant option, and I have example code using the clang crate that demonstrates that it would work.

Do either of these sound acceptable or at least like something we can iterate on to find a solution that would be acceptable for bindgen? I understand that both of them are definitely workarounds and not the ideal solution we hope to get to with LLVM changes, but I think it could provide some quality of life improvements for right now. For example, when we built our Rust bindings for libcryptsetup against a new version that used function-like macros in their #defines after previously not doing so, we had a build failure of our high level bindings because those constants suddenly disappeared from libcryptsetup-rs-sys. Either of these solutions would at least prevent that from happening in the future.

jbaublitz avatar Aug 16 '23 14:08 jbaublitz

In the short term, @ojeda found a workaround with clang where we should be able to use temporary intermediate files with the macros surrounded by parentheses to evaluate these sorts of expressions to a constant. This may slow down compile time, but it should have the benefit of allowing this particular case to be handled using current clang.

Yeah, to give an example:

void f(void) {
    (M);
}

Then one can query with a cursor placed at (, and Clang will resolve the expression. It seems to always give Kind: Int though, at least through c-index-test -- we should test corner cases up to 64-bit and see if it works for our purposes.

Another way is initializing a variable:

int m = M;

If one requests m, that seems to return the value after type conversion.

The clang maintainer is open to changes that would effectively allow using a header file as a context from which you could evaluate arbitrary macro expressions as I understand it (correct me if I'm wrong, @ojeda, since I wasn't part of this conversation).

@AaronBallman told me that the C indexing APIs (include/clang-c/) work by building an AST, but that AST does not keep the macro information. If we implemented the change, then Aaron offered to review the changes and so on.

The change would involve keeping enough information to be able to evaluate the macros (or perhaps directly saving a map of evaluations, similar to what you suggest later, but on Clang's side).

(Thanks a lot @jbaublitz for looking into this)

ojeda avatar Aug 16 '23 18:08 ojeda

FWIW, Clang's C indexing APIs have a TU-level flag named CXTranslationUnit_DetailedPreprocessingRecord. Passing that causes the indexing APIs to behave as if -Xclang -detailed-preprocessing-record was passed to the compiler which might get you what you need already. If it doesn't get you what you need, then it's quite likely in the correct area for where you'd make changes to expose the macro information you're after. (I'm not intimately familiar with this part of the code, but you may need to call clang_annotateTokens to get cursors that visit macros.)

AaronBallman avatar Aug 16 '23 18:08 AaronBallman

Yeah, in my case I just tested via c-index-test, which from a quick look it seems to set CXTranslationUnit_DetailedPreprocessingRecord for -evaluate-cursor-at (and no reparsing). But if it happens to e.g. work via the API without changes to libclang, that would be great. Thanks a lot @AaronBallman for taking a look!

ojeda avatar Aug 16 '23 19:08 ojeda

@pvdrz Just a few more considerations:

  • Evaluating all macros in a single file could result in failure to parse any of the macros if some represent expansions that result in a syntax failure when prepended with a semicolon. If we take that approach, we might want to have a fallback mode if the single intermediary file fails to compile to evaluate each macro in its own intermediary file.
  • We could always make the intermediary file approach opt-in only. That way, we're not having anyone use it by default, but they can opt in if they find this issue and want to use this feature before Clang supports this kind of thing directly.
  • Corner cases where this macro evaluation doesn't work as well: char-cast integers will show up as integers still, the macro evaluation doesn't seem to work very well for 128-bit integers even when cast as (__uint128_t) and they appear to be truncated still.

jbaublitz avatar Aug 17 '23 20:08 jbaublitz

@pvdrz Would you prefer to just have the support in LLVM as opposed to the workaround I proposed?

jbaublitz avatar Nov 15 '23 16:11 jbaublitz