rellic Refine types based on debug metadata

Solves #190

Oct 22 '21 15:10 frabert

Issues with this right now:

as mentioned in https://github.com/lifting-bits/rellic/issues/190#issuecomment-949694419 const causes problems and is ignored right now
~ASTBuilder does not implement outputting typedefs yet, so they're left out~
~Globals do not enjoy the same treatment as locals do, which is an issue that needed to be addressed in #180 probably. But that ship has sailed.~
Current implementation is messy, but I'm still not sure if a separate successive pass can do a clean job either

Oct 22 '21 15:10 frabert

Regarding the overall method: yes; thinking about it, the current way of trying to match IR and debug metadata is not very robust. The "correct" solution will probably involve some kind of more thorough, "holistic" approach. Question: where should it fit? Could it be that it would be easier to do that before trying to lift the bitcode, as a preprocessing step?

Oct 26 '21 16:10 frabert

Question: where should it fit? Could it be that it would be easier to do that before trying to lift the bitcode, as a preprocessing step?

I think doing so before might be a good place to start, but I don't know what that looks like. What is evident is that right now, in the middle of doing one thing, we're trying to reverse engineer debug info types, and integrate that info. I think that attempts to integrate more "smarts" into that process are going to lead to issues in trying to manage the complexity of what's going on. Some kind of pass, or multiple passes, that interprets bitcode values, types, and debug info locations/types ahead-of-time seems prudent.

Perhaps we can formulate this problem as the type of info that we think we should be able to present. For example, at each LLVM instruction, what logical source variables are "live" and where are their values, and what are their types? The "where are their values" is tricky, because their values may be embedded in other values (e.g. high bits, low bits, mid-bits [for the case of a bitfield], in a structure value, in a vector value, at some byte offset of an alloca). I think it would be prudent to work toward the ability to output this information, as a proof-of-capability for getting it, and a way of forcing it into a coherent API.

Oct 26 '21 16:10 pgoodman

This might open up opportunities. For example, if the debug info "tells" us that two LLVM values represent the same local variable, and if the two values have the same LLVM type, then we might be able to keep track of this as saying: these two values are in a "storage equivalence class."

Oct 26 '21 16:10 pgoodman

Regarding this specific PR: due to the way the QualTypes work, I can't think of a good way to factor out the new code out of IRToASTVisitor without essentially duplicating all of GetQualType.

Also, most tests need additional debug info for function prototypes, and I still haven't figured out a way to convince clang to consistently emit info for those. Even -O1 seemed do the trick, but it doesn't always work.

Oct 28 '21 11:10 frabert

rellic rellic copied to clipboard

Refine types based on debug metadata

rellic
rellic copied to clipboard