c2rust icon indicating copy to clipboard operation
c2rust copied to clipboard

improve compilation database creation

Open rizsotto opened this issue 5 years ago • 3 comments

Hi there,

I've read this article about how good C2Rust is. It's a nice post, like it very much... I am author of the Bear and scan-build tools. And would like to learn how these tools would be more useful to this project. I want to understand the use cases for C2Rust.

  • I got decided to improve Bear to be working on OSX, so a single tool could do better than two.
  • Would be happy to implement the link command capture. I just need how an entry would look like.
  • I am interested to know how you treat the compilation entries. Does it matter to keep the full path of the compiler, or that is replaced by one of the tools you developed?
  • What would be the ideal install process of it? (Is cmake is more difficult than pip? Would cargo be better?)
  • What would be ideal usage of a compilation database generator tool?
  • Is it desired to speed up the process by not call the real compiler? (But fake the output to make the build system happy)
  • Is it desired to generate entries for include files separately?

Thanks in advance!

rizsotto avatar Jan 14 '20 07:01 rizsotto

Hi @rizsotto, great to hear from you. We have used bear and scan-build in C2Rust to produce compilation databases for some C code bases, and they worked really well.

You wrote a long list of questions, and maybe other people would like to respond to some of them, so I'll just stick to the ones I have thoughts on.

Would be happy to implement the link command capture. I just need how an entry would look like.

I can give you some details on that. The link commands produced by our wrappers looks basically like compile commands, with two main differences (design is still open to discussion, I just picked whatever got things working the quickest):

  • Link commands usually have multiple inputs, but the compilation database only has a single file entry for the input file. It would be great if we replaced that with an inputs list, but clang refused to accept compilation databases with unknown entries. I got around this limitation by storing a bencoded dictionary containing all the additional information as a string directly into file, as seen here. It's an ugly hack, and it would be great to find a cleaner way to do this. This additional information includes things like:
    • inputs: the input object files
    • libs and lib_dirs: the library dependencies and where to find them
    • type: whether this link commands builds a shared library or executable
  • Each compilation entry also has a type field specifying whether that command was invoked as cc (the compiler) or ld (the linker) and I originally thought this would be useful, but I'm not so sure anymore. Many C code bases usually invoke the compiler binary even for linking, which would show up as a cc, instead of invoking the linker directly.

Here's an example from ioq3 (sorry for the length, this is one of the shortest ones):

  {
    "directory": "/mnt/ssd1/ahomescu/development/immunant/ioq3",
    "arguments": [
      "cc",
      "-Wall",
      "-fno-strict-aliasing",
      "-Wimplicit",
      "-Wstrict-prototypes",
      "-pipe",
      "-DUSE_ICON",
      "-DARCH_STRING=\"x86_64\"",
      "-DNO_GZIP",
      "-Icode/zlib",
      "-DUSE_INTERNAL_JPEG",
      "-Icode/jpeg-8c",
      "-DUSE_LOCAL_HEADERS",
      "-DPRODUCT_VERSION=\"1.36_GIT_d0fe4462-2020-01-10\"",
      "-Wformat=2",
      "-Wno-format-zero-length",
      "-Wformat-security",
      "-Wno-format-nonliteral",
      "-Wstrict-aliasing=2",
      "-Wmissing-format-attribute",
      "-Wdisabled-optimization",
      "-Werror-implicit-function-declaration",
      "-MMD",
      "-o",
      "build/release-linux-x86_64/baseq3/qagamex86_64.so"
    ],
    "file": "/c2rust/link/d6:inputsl47:build/release-linux-x86_64/baseq3/game/g_main.o48:build/release-linux-x86_64/baseq3/game/ai_chat.o47:build/release-linux-x86_64/baseq3/game/ai_cmd.o49:build/release-linux-x86_64/baseq3/game/ai_dmnet.o48:build/release-linux-x86_64/baseq3/game/ai_dmq3.o48:build/release-linux-x86_64/baseq3/game/ai_main.o48:build/release-linux-x86_64/baseq3/game/ai_team.o48:build/release-linux-x86_64/baseq3/game/ai_vcmd.o48:build/release-linux-x86_64/baseq3/game/bg_misc.o49:build/release-linux-x86_64/baseq3/game/bg_pmove.o53:build/release-linux-x86_64/baseq3/game/bg_slidemove.o47:build/release-linux-x86_64/baseq3/game/bg_lib.o49:build/release-linux-x86_64/baseq3/game/g_active.o49:build/release-linux-x86_64/baseq3/game/g_arenas.o46:build/release-linux-x86_64/baseq3/game/g_bot.o49:build/release-linux-x86_64/baseq3/game/g_client.o47:build/release-linux-x86_64/baseq3/game/g_cmds.o49:build/release-linux-x86_64/baseq3/game/g_combat.o48:build/release-linux-x86_64/baseq3/game/g_items.o46:build/release-linux-x86_64/baseq3/game/g_mem.o47:build/release-linux-x86_64/baseq3/game/g_misc.o50:build/release-linux-x86_64/baseq3/game/g_missile.o48:build/release-linux-x86_64/baseq3/game/g_mover.o50:build/release-linux-x86_64/baseq3/game/g_session.o48:build/release-linux-x86_64/baseq3/game/g_spawn.o49:build/release-linux-x86_64/baseq3/game/g_svcmds.o49:build/release-linux-x86_64/baseq3/game/g_target.o47:build/release-linux-x86_64/baseq3/game/g_team.o50:build/release-linux-x86_64/baseq3/game/g_trigger.o48:build/release-linux-x86_64/baseq3/game/g_utils.o49:build/release-linux-x86_64/baseq3/game/g_weapon.o50:build/release-linux-x86_64/baseq3/qcommon/q_math.o52:build/release-linux-x86_64/baseq3/qcommon/q_shared.o51:build/release-linux-x86_64/baseq3/game/g_syscalls.oe8:lib_dirsle4:libsle4:type6:sharede",
    "output": "build/release-linux-x86_64/baseq3/qagamex86_64.so"
  },

I am interested to know how you treat the compilation entries. Does it matter to keep the full path of the compiler, or that is replaced by one of the tools you developed?

We take the compilation database as an input and pass it to clang (and also parse it ourselves separately), so the full C compiler path is fine.

Is it desired to speed up the process by not call the real compiler? (But fake the output to make the build system happy)

I think you'd still need to invoke the compiler if you want to get far enough to also see the link commands. Otherwise, the build tool, e.g. make, will just stop before linking because none of the object files exist.

ahomescu avatar Jan 14 '20 21:01 ahomescu

Hi @rizsotto, I'm excited that you reached out; awesome that you're willing to improve bear for our use case!

I got decided to improve Bear to be working on OSX, so a single tool could do better than two.

I think my colleague @ahomescu covered the major points. I just wanted to say that I'm excited about the prospect of having macOS support.

What would be the ideal install process of it? (Is cmake is more difficult than pip? Would cargo be better?)

On macOS, bear is already installable from homebrew (e.g. brew install bear) so I think that's the ideal provisioning method for Mac devs and CI workflows alike.

Is it desired to generate entries for include files separately?

I don't think we have a use case for that and other than adding the linker commands, I think there's value in staying as close to the established format for compile_commands.json as possible.

thedataking avatar Jan 14 '20 23:01 thedataking

Thank you guys for your input. I am glad that you've spent time to answer my questions.

I concluded that the biggest improvement for you would be the linking support. :smile: It was planed to implement, but have not found strong use case for it. And the details were not clear what are the needed attributes for that. (Thanks @ahomescu to share the details.) Here is the ticket to follow up on it rizsotto/Bear#276

rizsotto avatar Jan 15 '20 03:01 rizsotto