io icon indicating copy to clipboard operation
io copied to clipboard

Thread leak in ffmpeg ops

Open kyamagu opened this issue 3 years ago • 11 comments

Following #1410, I was checking ffmpeg ops with TSAN build on mac.

I observe two warnings from the following test:

def test_ffmpeg_decode_video_thread():
    """test_ffmpeg_decode_video_thread"""
    def _decode(video_path):
        content = tf.io.read_file(video_path)
        video = tfio.experimental.ffmpeg.decode_video(content, 0)
        return video

    dataset = tf.data.Dataset.from_tensor_slices([video_path])
    dataset = dataset.repeat(256)
    dataset = dataset.map(_decode, num_parallel_calls=16)
    for video in dataset:
        pass

Warning summaries:

SUMMARY: ThreadSanitizer: data race (libtensorflow_io_ffmpeg_4.2.so:x86_64+0xca44) in tensorflow::data::FFmpegInit()
SUMMARY: ThreadSanitizer: thread leak (libtensorflow_framework.2.dylib:x86_64+0x9ff38d) in tensorflow::(anonymous namespace)::PosixEnv::StartThread(tensorflow::ThreadOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<void ()>)

The full test.log.

https://github.com/tensorflow/io/blob/41e998771ac921dca5ed9cad284c4b982d2582e6/tensorflow_io/core/kernels/ffmpeg_kernels.cc#L44

kyamagu avatar May 12 '21 02:05 kyamagu

This doesn't look harmful though. There might be other reasons for segfault in mac CI environment.

kyamagu avatar May 12 '21 06:05 kyamagu

@kyamagu I would like to ask how you run the sanitizer with python and the shared library ? Or we have to write a sample C++ code for it ? Thank you !

vnghia avatar May 12 '21 07:05 vnghia

@vnvo2409 I build CPython with sanitizers. In my env, I use pyenv and pyenv-alias to build CPython like this:

CC=clang \
MACOSX_DEPLOYMENT_TARGET=10.14 \
SDKROOT=/Library/Developer/CommandLineTools/SDKs/MacOSX11.1.sdk \
CFLAGS="-fsanitize=thread" \
LDFLAGS="-fsanitize=thread" \
VERSION_ALIAS="3.8.9_tsan" \
pyenv install 3.8.9

After that, I just build the extension with sanitizer flags in bazel.

kyamagu avatar May 12 '21 09:05 kyamagu

Also note I had to set OMP_NUM_THREADS=1 to import tensorflow to avoid a weird deadlock issue in openblas.

kyamagu avatar May 12 '21 09:05 kyamagu

@kyamagu Thank you! I will try to set up a CI with the sanitizer.

vnghia avatar May 12 '21 09:05 vnghia

Now that #1425 fixes macOS tests, will close this issue.

kyamagu avatar May 18 '21 09:05 kyamagu

@kyamagu There is still one test that is failing on macOS: https://github.com/tensorflow/io/blob/master/tests/test_ffmpeg.py#L89

It might be still related to the thread safety issue though.

yongtang avatar May 18 '21 09:05 yongtang

@yongtang Okay, then will keep this issue opening.

kyamagu avatar May 18 '21 09:05 kyamagu

@yongtang @kyamagu I spun up a CI for asan on Linux https://github.com/vnvo2409/io/actions/runs/860162724 ( artifact test-log contains asan log for each test ). Hope you could find some useful information from it. Some observations:

  • Caching with Github Actions seems faster and more stable.
  • asas is too sensitive, we might need to add a suppression if we want to add this to our CI.

@yongtang Any chance we could add this build to our CI ?

vnghia avatar May 19 '21 19:05 vnghia

Thanks, but there doesn't seem any leak relevant to ffmpeg crash on mac from asan

kyamagu avatar May 20 '21 13:05 kyamagu

Any chance we could add this build to our CI ?

@vnvo2409 I think adding asan will be very helpful to CI. At least it might be able to catch some future memory issues.

yongtang avatar May 20 '21 16:05 yongtang