mambo icon indicating copy to clipboard operation
mambo copied to clipboard

Suggested enhancements to v2

Open umarcor opened this issue 4 years ago • 6 comments

The main purpose of this PR is to contribute a plugin that allows to replace functions in unmodified applications. Apart from that, a related test is added and some other enhancements are proposed:

  • Section "Plugin API" of the README is updated:
    • ~Setting PLUGINS_NEW explicitly is not required, as long as PLUGINS is used: https://github.com/beehive-lab/mambo/blob/master/makefile#L52-L54. (#50)~
    • Mention PLUGIN_ARGS. (#54)
  • Fix a couple of minor compiler warnings in plugins/symbol_example.c (#51)
  • ~Update the README of plugins/memcheck: (#52)~
    • ~I think it is not required to checkout a branch anymore.~
    • ~# is used for headers, instead of the rst-alike syntax.~
  • ~Currently, git is required/used in the makefile. When git is not available, GIT_VERSION is empty. With this PR, <nogit> is written instead. (#53)~
  • A GitHub Actions workflow is added, in order to test compilation and execution in Continuous Integration. All the tests are built and executed (not all the plugins, tho).
  • A plugin named function_replacement.c is added, along with a test application and replacement for rand from stdlib. This plugin is built and tested along with existing symbol_example.c.

@lgeek, please, let me know whether you'd like some of these modifications to be handled in separate PRs.

Notes about Continuous Integration:

  • dbhi/qus is used to register the qemu-user-static binary to execute aarch64 applications on amd64, and the binary is loaded in memory.
  • A container image maintained in dbhi/docker is used to build and run the tests. Image aptman/dbhi:bionic-mambo-arm64 is based on arm64v8/ubuntu:bionic and it includes the dependencies required by MAMBO: https://github.com/dbhi/docker/blob/master/mambo#L5-L9.
  • Naturally, it is possible to achieve the same result with other procedures (e.g. installing qemu-user on the host, or using qemu-system instead). I'm ok with modifying the workflow, should you not want to rely on external projects.
  • Using qemu-user has known caveats (see #19). Nonetheless, building MAMBO and executing some of the tests do work, including the one about function replacement.
    • For example this is the execution of the symbols test, and this is the same on MAMBO (with plugins symbol_example and function_replacement).
    • Find further details in GuillermoCallaghan/mambo#1.

Notes about test symbols:

  • The main application allocates a buffer of four integers through malloc and calls rand four times. It prints the content of the array.
  • The function replacement plugin replaces rand with rand_replacement.
  • rand_replacement is defined in test/symbols_rand.c. It contains a global counter that is increased each time the function is executed. The function returns the value of the counter, instead of a random value.

/cc @GuillermoCallaghan @iordanouki

umarcor avatar May 06 '20 05:05 umarcor

Hi, thanks for the patches. I can cherry pick individual patches or you can open separate PRs, at your choice.

a69d8de looks good

ci: 0) we should test both aarch32 and aarch64, 1) it would be nice to avoid qemu-user-static with its incomplete support and 2) other projects in beehive-lab regularly use up 100% of the included services - I'm not sure it would be that useful to have CI that regularly stops working and it also means we should avoid inefficient solutions like pulling in mambo-vm which would have been an easy fix. Self-hosted runners seem to be recommended against for public repositories because of security concerns. Any ideas?

function_replacement: plugins should only use the public interfaces defined in the headers in api/ when possible. There's a much easier way to replace functions using the API, I'll rewrite the plugin - edit: see my comment on its source file.

symbol_example: Thanks for fixing the warnings, I think I forgot to clean up that code before committing it. Please see my inline comment.

1e10360: Good idea, please see my inline comment.

memcheck readme: Good point about not having to check out the separate branch. Underlined headings are standard markdown, I find them more readable in a plain text editor. I'd rather not change that. Similarly, I don't think sh syntax highlighting for the output text adds anything meaningful (in fact it's a bit confusing that it highlights the word in in the text). I prefer the 4 space indentation syntax because it's cleaner in a text editor, but let me know if you think there are other benefits to the 3 backtick syntax.

d231ddd: Good catch, but maybe we should instead use a pre-commit hook to embed the revision number in the source code. I'll do that if you don't have the time.

2f07c8b: Maybe I'm missing something, but I think rand_replacement() should be implemented in plugins/function_replacement.c. And maybe rename test/symbols to test/function_replacement or something more specific like that. It also might be worth adding a comment explaining that test is only meaningful when ran under the function_replacement plugin.

Thanks

lgeek avatar May 06 '20 13:05 lgeek

Regarding CI, it looks like Travis CI supports ARM natively and they have a free plan for open source projects. It might work better for us.

lgeek avatar May 06 '20 13:05 lgeek

First off, thanks a lot for your so quick response!

I think that you can first cherry-pick the commits you are ok with. Then, I will rebase and optionally split. I'd prefer not to fix and force-push now, so that the references in your comment above (and here) are not broken.

  • a69d8de: pick.
  • 3122103, 1e10360: pick. However, I cannot see the inline comments. Maybe you have a some "pending review" to be published?
  • d231ddd: please go ahead. I have never configured a pre-commit hook...
  • e5093d7, 93d8914, 0ea6943, 2f07c8b: discussion below.

Regarding CI (e5093d7), since all your comments are both accurate and pertinent, I will open a separate issue to discuss it.

[93d8914] function_replacement: plugins should only use the public interfaces defined in the headers in api/ when possible. There's a much easier way to replace functions using the API, I'll rewrite the plugin - edit: see my comment on its source file.

I believe there might be a pending review... since I cannot see those comments.

[2f07c8b] Maybe I'm missing something, but I think rand_replacement() should be implemented in plugins/function_replacement.c. And maybe rename test/symbols to test/function_replacement or something more specific like that. It also might be worth adding a comment explaining that test is only meaningful when ran under the function_replacement plugin.

The idea is that any user willing to replace a function needs to:

  • Pick plugins/function_replacement.c and modify the defines.
  • Provide their own replacement function, in this case test/symbols_rand.c.
  • Build MAMBO with both of them.

Users are expected to manually write the equivalent to symbols_rand.c, but they are expected to reuse function_replacement.c as is (just replacing the defines with sed). That's why I put the former in subdir test.

Shall I create test/symbols/, and rename

  • test/symbols.c -> test/symbols/main.c
  • test/symbols_rand.c -> test/symbols/replacement.c.

along with a test/symbols/README.md?

Furthermore, I want to try building symbols_rand.c (the user-provided replacement body) as a shared library and either:

  • Link the shared library when MAMBO is built.
  • Or, make the plugin load the library at runtime through dlopen.

This last use case is an alternative|complement|workaround to #37. The main motivation is to avoid building multiple MAMBO binaries, each with a different plugin. AFAIU, it is currently not possible to have multiple swappable implementations of a replacement function, because registering multiple callbacks for the same function would either crash or execute all the callbacks (I don't know whether this is tested or documented yet). In #37, I proposed adding some built-in features in MAMBO to allow enabling multiple (potentially conflicting) plugins selectively. However, since it can be a non-trivial enhancement, I'd like to explore handling that complexity by using MAMBO to "expose the internal function of an unmodified binary as a function that is loaded from a shared library". Then, we could use regular LD_LIBRRAY_PATH to override which lib is used at runtime. Not the cleanest solution, but I hope it works.

FTR, DynamoRIO does NOT allow to register multiple replacements for the same function. IIRC, only the first/last callback/hook is preserved. However, in DynamoRIO all plugins (clients) are shared libraries, by default. Hence, building several (potentially conflicting) plugins and loading some of them only is supported by the default usage. Pin tools cannot load shared libraries from replacement function's body.

[0ea6943] memcheck readme: Good point about not having to check out the separate branch. Underlined headings are standard markdown, I find them more readable in a plain text editor. I'd rather not change that.

This is a fair point. Honestly, I don't mind which style to use. For some reason, I assumed that # was used in the main README, and then I saw one header that can be fixed. Nonetheless, it makes much more sense to fix that single header than to change all the style. Since it is such a small change, will you do it on your own?

Similarly, I don't think sh syntax highlighting for the output text adds anything meaningful (in fact it's a bit confusing that it highlights the word in in the text). I prefer the 4 space indentation syntax because it's cleaner in a text editor, but let me know if you think there are other benefits to the 3 backtick syntax.

I agree that syntax highlighting is not meaningful in these cases. However, my main motivation to add backticks instead of using 4 space indentation is to preserve the blocks that have blank lines in-between. I kept others. The motivation for this is that text editors which remove trailing spaces (and which are not configured to handle Markdown) might remove those. It should still be rendered properly, but I think it is better to make it explicit. Hence, are you ok with keeping backticks, but removing sh?

umarcor avatar May 07 '20 08:05 umarcor

Sorry, I had forgotten to submit the comments.

I now understand your thinking for function_replacement.c. Sounds good, but maybe the defines should just be set in PLUGIN_ARGS at compile time rather than in the source file.

Furthermore, I want to try building symbols_rand.c (the user-provided replacement body) as a shared library

In practice, last time I tried it relatively recently, MAMBO itself compiled and ran if linked dynamically. But 1) that isn't tested much and it could be creating some library transparency issues, 2) it breaks the conditions we need to shape the virtual memory layout and efficiently implement shadow memory for plugins such as memcheck, 3) it would create dependencies on specific versions of the libraries it is linked against, while at the moment you can take the statically linked executable and use it on pretty much any Arm/Linux system including Android, etc.

dlopen

Unfortunately, using dlopen in a statically linked executable comes with pretty much all the disadvantages of dynamic linking. If an user really needs to do it in a plugin, it will work with the limitations mentioned above, but it should be avoided if possible (and I think it very much is in this case).

The main motivation is to avoid building multiple MAMBO binaries, each with a different plugin.

This is the same approach taken by Valgrind for similar reasons. To help with this, using the current makefile, you can now set the OUTPUT_FILE environment variable to control the name of the executable generated. Also see the makefile for how to add multiple build targets for separate plugins (cachesim, memcheck). If this is about the compile time and if it's annoyingly long on your system, we could switch to building separate .o files for each source file to avoid recompiling the same sources for each plugin.

FTR, DynamoRIO does NOT allow to register multiple replacements for the same function.

Function replacement isn't an API feature at this time, but you can register multiple callbacks for the same function. Of course, there are a multitude of ways in which the modifications could interfere with each other if they're not designed to work together.

I assumed that # was used in the main README, and then I saw one header that can be fixed

Any number of underlining = and - are allowed.

The motivation for this is that text editors which remove trailing spaces (and which are not configured to handle Markdown) might remove those.

That sounds like an editor bug, I don't think they should mess with text on lines you've not actively modified. If you or someone else reports being affected by this we can change it, but otherwise I really don't like the backtick syntax.

lgeek avatar May 07 '20 10:05 lgeek

Sorry, I had forgotten to submit the comments.

It's ok.

I now understand your thinking for function_replacement.c. Sounds good, but maybe the defines should just be set in PLUGIN_ARGS at compile time rather than in the source file.

Will do.

Furthermore, I want to try building symbols_rand.c (the user-provided replacement body) as a shared library

In practice, last time I tried it relatively recently, MAMBO itself compiled and ran if linked dynamically. But 1) that isn't tested much and it could be creating some library transparency issues, 2) it breaks the conditions we need to shape the virtual memory layout and efficiently implement shadow memory for plugins such as memcheck, 3) it would create dependencies on specific versions of the libraries it is linked against, while at the moment you can take the statically linked executable and use it on pretty much any Arm/Linux system including Android, etc.

I know, and I appreciate that you put it so clearly. However, in the context of dbhi, this is both a requirement and a negligible issue. The use case is to benefit from MAMBO to replace functions in unmodified binaries with existing arbitrary code. The specific replacements that we are using are simulation of hardware models which depend on shared libraries for execution. We are trying to provide an alternative to capturing traces from a program, saving them to some file and then starting a separate simulation that reads the file. AFAIU, to pass C pointers from the original application to the simulation we are forced to link MAMBO dynamically. Hence, currently we use PLUGIN_ARGS="-ldl -lz -lm" to link MAMBO dynamically (see https://github.com/ghdl/ghdl/issues/640#issuecomment-428464725).

dlopen

Unfortunately, using dlopen in a statically linked executable comes with pretty much all the disadvantages of dynamic linking. If an user really needs to do it in a plugin, it will work with the limitations mentioned above, but it should be avoided if possible (and I think it very much is in this case).

May I ask you to clarify what do you mean with "very much (...) in this case"? Do you mean that it can have performance/stability issues? Or is it a matter of the inconvenience that it is not a "true" static binary anymore?

The main motivation is to avoid building multiple MAMBO binaries, each with a different plugin.

This is the same approach taken by Valgrind for similar reasons. To help with this, using the current makefile, you can now set the OUTPUT_FILE environment variable to control the name of the executable generated. Also see the makefile for how to add multiple build targets for separate plugins (cachesim, memcheck). If this is about the compile time and if it's annoyingly long on your system,

It is not about compile time... c'mon! Building MAMBO can hardly be faster! 😄

It's because of build dependencies. I'd like to build a single MAMBO that replaces a given function, and copy it to all the target boards. Then, I can write one or multiple replacement "plugin bodies" in any other environment with GCC/Clang and with the dependencies of my arbitrary code only; without Ruby, without PIE, without MAMBO, without even being aware that my function replacement is going to be used for that purpose. I want to completely decouple three agents in the workflow:

  • Who writes the original application and distributes it as an executable binary.
  • Who writes a function that matches the prototype of a function in the original application and distributes it as a shared library.
  • Who takes the first unmodified application and replaces it with the second unmodified function.

The last one is the only one that would need to know about MAMBO, to install the deps and to build it.

we could switch to building separate .o files for each source file to avoid recompiling the same sources for each plugin.

This might be interesting during development, when MAMBO is compiled again and again. If it does not make "regular builds" much slower, I think it would be desirable.

FTR, DynamoRIO does NOT allow to register multiple replacements for the same function.

Function replacement isn't an API feature at this time, but you can register multiple callbacks for the same function. Of course, there are a multitude of ways in which the modifications could interfere with each other if they're not designed to work together.

Do you mean that DynamoRIO's API improved since I last checked? Or are you describing the current status of MAMBO?

The motivation for this is that text editors which remove trailing spaces (and which are not configured to handle Markdown) might remove those.

That sounds like an editor bug, I don't think they should mess with text on lines you've not actively modified. If you or someone else reports being affected by this we can change it, but otherwise I really don't like the backtick syntax.

VSCode does it (it's an option). It removes the trailing spaces. However, it is still properly displayed: https://github.com/umarcor/mambo/blob/db5bc98caa81820bfed218ecad313c5caf6f8a2b/plugins/memcheck/README.md. Hence, I will remove backticks.

umarcor avatar May 07 '20 11:05 umarcor

I pushed fixes/changes to this branch, preserving previous commits. However, there is a rebased and squashed version in https://github.com/umarcor/mambo/commits/cleanup, so you can cherry-pick them cleanly.

I moved the function replacement plugin, as well as the example replacement function, to subdir plugins/funcrepl. I added a README too.

umarcor avatar May 07 '20 18:05 umarcor