rcl ASAN leaks in test codes related to mock_utils::patch_and

Bug report

Required Info:

Operating System:
- Ubuntu 20.04
Installation type:
- source
Version or commit hash:
- c83119c826c75fbaf6e741658927a953318832ad
DDS implementation:
- rmw_cyclone_dds
Client library (if applicable):
- rcl

Steps to reproduce issue

Build with ASAN and run test_publisher__rmw_cyclonedds_cpp . I got the following error.

24: ==16400==ERROR: LeakSanitizer: detected memory leaks
24: 
24: Direct leak of 72 byte(s) in 1 object(s) allocated from:
24:     #0 0x7f384f9f0bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
24:     #1 0x7f384f7b72e6 in __default_allocate /src/ros2_rolling_asan/src/ros2/rcutils/src/allocator.c:37
24:     #2 0x7f384f7d0fa7 in rcutils_string_map_init /src/ros2_rolling_asan/src/ros2/rcutils/src/string_map.c:62
24:     #3 0x7f384f8832fd in rcl_resolve_name /src/ros2_rolling_asan/src/ros2/rcl/rcl/src/rcl/node_resolve_name.c:47
24:     #4 0x7f384f883eeb in rcl_node_resolve_name /src/ros2_rolling_asan/src/ros2/rcl/rcl/src/rcl/node_resolve_name.c:152
24:     #5 0x7f384f87d42a in rcl_publisher_init /src/ros2_rolling_asan/src/ros2/rcl/rcl/src/rcl/publisher.c:81
24:     #6 0x55a3171caf58 in TestPublisherFixture__rmw_cyclonedds_cpp_test_mocks_fail_publisher_init_Test::TestBody() /src/ros2_rolling_asan/src/ros2/rcl/rcl/test/rcl/test_publisher.cpp:787

Expected behavior

no leak

Actual behavior

(may be) leak.

As this test case looks an error case test, so I'm sorry if this leak is an expected behavior.

Additional information

This happens at the following block.

// test_publisher.cpp
TEST_F(CLASSNAME(TestPublisherFixture, RMW_IMPLEMENTATION), test_mocks_fail_publisher_init) {
  // snip

  {
    // Internal failure when fini rcutils_string_map returns error, targets substitution_map fini
    auto mock = mocking_utils::patch_and_return(
    "lib:rcl", rcutils_string_map_fini, RCUTILS_RET_ERROR);

    ret = rcl_publisher_init(&publisher, this->node_ptr, ts, topic_name, &publisher_options);
    EXPECT_EQ(RCL_RET_ERROR, ret) << rcl_get_error_string().str;
    rcl_reset_error();
  }

  // snip
}

It looks:

rcutils_string_map_fini is patched and only returns RCUTILS_RET_ERROR.
so, memories allocated by rcutils_string_map_init are not deallocated.

By using inject_on_return this error message disappeared. Additionally, I think we can check the behavior when rcutils_string_map_fini returns an error.

    auto mock = mocking_utils::inject_on_return(
      "lib:rcl", rcutils_string_map_fini, RCUTILS_RET_ERROR);

I want to note that there are some leaks related to mock test not only in test_publisher but also in other tests. If this fix is correct, I'm glad to fix all tihs kind of leaks and send PR.

Jan 29 '21 03:01 y-okumura-isp

By using inject_on_return this error message disappeared. Additionally, I think we can check the behavior when rcutils_string_map_fini returns an error.

For the most part, this would be the right way to go. There is one big problem with using inject_on_return; it is not supported on RHEL/CentOS. While that isn't currently one of our supported platforms, it is a platform we are trying to get working.

I'm not quite sure of the way forward. One thing we could try is to figure out how to skip these type of tests specifically on RHEL/CentOS. Then we could switch these to use inject_on_return, as they would just be skipped there. What do you think?

Jan 29 '21 14:01 clalancette

Thank you for comment.

Hmm, I rather want to run inject_on_return on RHEL/CentOS 😅 BTW, does inject_on_return work correctly on MacOS or Windows?

Feb 02 '21 05:02 y-okumura-isp

Hmm, I rather want to run inject_on_return on RHEL/CentOS sweat_smile

The other way forward is to dig into mimick and try and figure out a way to make inject_on_return work on RHEL. I don't know all of the details, but @hidmic does and may be able to provide you some information.

BTW, does inject_on_return work correctly on MacOS or Windows?

As far as I know, yes, with some caveats. But again, @hidmic may be able to give you better information.

Feb 02 '21 13:02 clalancette

+1 👍 for using inject_on_return

Feb 15 '21 08:02 fujitatomoya

As @clalancette, inject_on_return works everywhere but on RHEL/CentOS. For reasons that remain obscure to me, unlike other Linux distributions, binaries in CentOS yield .plt trampoline addresses in lieu of that of functions and somehow manage to not break ODR. An optimization? A global .got.plt table? Documentation down there is very scarce, so I wasn't able to find a way.

Mar 04 '21 18:03 hidmic

I was thinking a way of fixing inject_or_return, which currently does: stores the original function pointer, poisons the trampoline table with a function that calls the original function and then returns something else.

That doesn't work if the function pointer is pointing to the trampoline (you get infinite recursion).

On the other hand, that function could do one of the following:

Read the current trampoline and get a pointer to the original function from it (I'm not completely sure if this is possible).
Read the current trampoline, poison the table with a new function that: restores the old trampoline, calls the old function (now that the trampoline was restored), poisons the trampolines table again. Note: this one is a problem if the function being mocked is recursive, but that's not usually the case for inject_on_return (used mainly for *_fini functions). This implementation could be renamed inject_on_return_once, just in case.

Both ideas are tricky and I don't know much about mimick internals, but I think they should work.

Mar 04 '21 21:03 ivanpauno

I don't know much about it, so I'm sorry if I'm wrong.

A global .got.plt table?

I have heard it in the security context. Is it related with RELRO(RELocation ReadOnly)? According to the literature, .got.plt table is used when RELRO is not Full RELRO. (But sorry, I can only give Japanese article: https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_uri&item_id=187361&file_id=1&file_no=1)

As I don't use CentOS, so I build rolling on Fedora 31(https://docs.ros.org/en/rolling/Installation/Fedora-Development-Setup.html). We can see security setting by https://github.com/slimm609/checksec.sh.

Here is the result.

# ubuntu
$ ../test/checksec.sh/checksec --file=build-asan/rcl/test/test_publisher__rmw_cyclonedds_cpp 
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH      Symbols         FORTIFY Fortified       Fortifiable     FILE
Full RELRO      Canary found      NX disabled   PIE enabled     No RPATH   RW-RUNPATH   5872) Symbols     No    0               10              build-asan/rcl/test/test_publisher__rmw_cyclonedds_cpp

# Fedora
# ./checksec.sh/checksec --file=build/rcl/test/test_publisher__rmw_cyclonedds_cpp 
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH	Symbols		FORTIFY	Fortified	Fortifiable	FILE
Partial RELRO   No canary found   NX disabled   No PIE          RW-RPATH   No RUNPATH   5803) Symbols	  No	0		10		build/rcl/test/test_publisher__rmw_cyclonedds_cpp

In Fedora, Partial RELRO is used. It looks we can use full RELRO by gcc -Wl,-z,norelr for example.

Mar 09 '21 04:03 y-okumura-isp

I hope we can run inject_on_return on Fedora 31 according to this gist.

This is the summary:

Summary

By building the execution file as Full RELRO and PIE, the warning mocking_utils::inject_on_return() cannot forward call looks disapear. UT runs and the behavior changes.
- Ubuntu and CentOS have different default settings about these.
As I don't know mimick or mocking_utils well, so I want someone to judge this change is right or not.
Additionally, if right, I'm happy if someone writes PR (I'm sorry, but we need to prioritize other tasks).

Thank you.

Mar 12 '21 05:03 y-okumura-isp

Thank you for the investigation @y-okumura-isp. I'll try to come back to this later on. If you're right, perhaps we can export these flags from mimick targets directly.

Apr 09 '21 16:04 hidmic

ASAN leaks in test codes related to mock_utils::patch_and_return

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

Summary