RedisAI icon indicating copy to clipboard operation
RedisAI copied to clipboard

Multiple test failures on macOS

Open rafie opened this issue 5 years ago • 8 comments

When running make -C opt test on macOS 10.14 (for the --use-slaves rounds):

        [ERROR]
        Unhandled exception: cannot get tensor from empty key

for the following tests, for the :

        basic_tests:test_run_onnx_model
        basic_tests:test_run_onnxml_model
        basic_tests:test_run_script
        basic_tests:test_run_tf_model
        basic_tests:test_run_tflite_model
        basic_tests:test_run_torch_model

When running make -C opt test TEST=basic_tests:test_run_script TEST_ARGS="-s":

basic_tests:test_run_script
...
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x1096a59e7 in libc10.dylib)
frame #1: torch::jit::GraphExecutorImplBase::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 907 (0x117d5775b in libtorch.dylib)
frame #2: (anonymous namespace)::torchRunModule((anonymous namespace)::ModuleContext*, char const*, long, DLManagedTensor**, long, DLManagedTensor**) + 1848 (0x109663db8 in redisai_torch.so)
frame #3: torchRunScript + 24 (0x109663608 in redisai_torch.so)
frame #4: RAI_ScriptRunTorch + 917 (0x109660b25 in redisai_torch.so)
frame #5: RedisAI_RunSession + 96 (0x1096475b0 in redisai.so)
frame #6: RedisAI_Run_ThreadMain + 92 (0x10964610c in redisai.so)
frame #7: _pthread_body + 126 (0x7fff60ece33d in libsystem_pthread.dylib)
frame #8: _pthread_start + 70 (0x7fff60ed12a7 in libsystem_pthread.dyli
27079:M 30 Jan 2020 18:52:48.633 # <ai> ERR expected 2 inputs, but got only 1 (run at ../torch/csrc/jit/graph_executor.cpp:473)

rafie avatar Jan 30 '20 17:01 rafie

Actually, when you go make -C test TEST=basic_tests:test_run_script GEN=1 SLAVES=0 AOF=0, the test fails too, but RLTests fails to report it.

rafie avatar Jan 30 '20 18:01 rafie

Thank you @rafie, I'll take a look today.

I'm developing on macOS 10.15. When I run

python -m RLTest --module install-cpu/redisai.so --test test/basic_tests.py

from my build directory, all tests pass. However, when I start redis-server --loadmodule install-cpu/redisai.so and run make -C opt test, I get the same (or very similar) failures. Will dig deeper.

lantiga avatar Feb 03 '20 12:02 lantiga

PS make -C opt test should start its own redis-server via RLTest.

rafie avatar Feb 03 '20 13:02 rafie

Hi @rafie, sorry. I was building through CMake directly and not through make -C opt build.

I started clean with

make -C opt fetch
make -C opt build
make -C opt test

and all tests passed on my system (including aof and slaves).

lantiga avatar Feb 04 '20 11:02 lantiga

I'm wondering if there are stale files somehow (in the test data or dependencies).

lantiga avatar Feb 04 '20 11:02 lantiga

It is possible to diagnose this on the CircleCI machine, where it fails consistently. It's possible to get a SSH connection that's kept alive for a few hours.

rafie avatar Feb 05 '20 08:02 rafie

Hi @rafie, right, I did that.

I triggered a rebuild with ssh enabled (so, still from within CircleCI, not after connecting with SSH), and the build failed with

basic_tests:test_run_mobilenet_multiproc
	[ERROR]
	Unhandled exception: cannot get tensor from empty key
Traceback (most recent call last):
  File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/RLTest/__main__.py", line 482, in _runTest
    fn()
  File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/RLTest/__main__.py", line 470, in <lambda>
    fn = lambda: test.target(env)
  File "/Users/distiller/project/test/basic_tests.py", line 765, in test_run_mobilenet_multiproc
    dtype, shape, data = con.execute_command('AI.TENSORGET', 'output', 'BLOB')
  File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/redis/client.py", line 755, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/redis/client.py", line 768, in parse_response
    response = connection.read_response()
  File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/redis/connection.py", line 638, in read_response
    raise response
redis.exceptions.ResponseError: cannot get tensor from empty key

But once I then ssh'd into the machine and ran make -C opt test, all tests passed.

Do you have any clue? Because I don't really at the moment.

lantiga avatar Feb 06 '20 15:02 lantiga

@lantiga and @rafie this should be solved by the PR #296 probably related to the master-replica sync and solved by https://github.com/RedisAI/RedisAI/blob/test.refactor/test/includes.py#L27

filipecosta90 avatar Feb 23 '20 20:02 filipecosta90