RedisAI
RedisAI copied to clipboard
Multiple test failures on macOS
When running make -C opt test on macOS 10.14 (for the --use-slaves rounds):
[ERROR]
Unhandled exception: cannot get tensor from empty key
for the following tests, for the :
basic_tests:test_run_onnx_model
basic_tests:test_run_onnxml_model
basic_tests:test_run_script
basic_tests:test_run_tf_model
basic_tests:test_run_tflite_model
basic_tests:test_run_torch_model
When running make -C opt test TEST=basic_tests:test_run_script TEST_ARGS="-s":
basic_tests:test_run_script
...
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x1096a59e7 in libc10.dylib)
frame #1: torch::jit::GraphExecutorImplBase::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) + 907 (0x117d5775b in libtorch.dylib)
frame #2: (anonymous namespace)::torchRunModule((anonymous namespace)::ModuleContext*, char const*, long, DLManagedTensor**, long, DLManagedTensor**) + 1848 (0x109663db8 in redisai_torch.so)
frame #3: torchRunScript + 24 (0x109663608 in redisai_torch.so)
frame #4: RAI_ScriptRunTorch + 917 (0x109660b25 in redisai_torch.so)
frame #5: RedisAI_RunSession + 96 (0x1096475b0 in redisai.so)
frame #6: RedisAI_Run_ThreadMain + 92 (0x10964610c in redisai.so)
frame #7: _pthread_body + 126 (0x7fff60ece33d in libsystem_pthread.dylib)
frame #8: _pthread_start + 70 (0x7fff60ed12a7 in libsystem_pthread.dyli
27079:M 30 Jan 2020 18:52:48.633 # <ai> ERR expected 2 inputs, but got only 1 (run at ../torch/csrc/jit/graph_executor.cpp:473)
Actually, when you go make -C test TEST=basic_tests:test_run_script GEN=1 SLAVES=0 AOF=0, the test fails too, but RLTests fails to report it.
Thank you @rafie, I'll take a look today.
I'm developing on macOS 10.15. When I run
python -m RLTest --module install-cpu/redisai.so --test test/basic_tests.py
from my build directory, all tests pass. However, when I start redis-server --loadmodule install-cpu/redisai.so and run make -C opt test, I get the same (or very similar) failures. Will dig deeper.
PS make -C opt test should start its own redis-server via RLTest.
Hi @rafie, sorry. I was building through CMake directly and not through make -C opt build.
I started clean with
make -C opt fetch
make -C opt build
make -C opt test
and all tests passed on my system (including aof and slaves).
I'm wondering if there are stale files somehow (in the test data or dependencies).
It is possible to diagnose this on the CircleCI machine, where it fails consistently. It's possible to get a SSH connection that's kept alive for a few hours.
Hi @rafie, right, I did that.
I triggered a rebuild with ssh enabled (so, still from within CircleCI, not after connecting with SSH), and the build failed with
basic_tests:test_run_mobilenet_multiproc
[ERROR]
Unhandled exception: cannot get tensor from empty key
Traceback (most recent call last):
File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/RLTest/__main__.py", line 482, in _runTest
fn()
File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/RLTest/__main__.py", line 470, in <lambda>
fn = lambda: test.target(env)
File "/Users/distiller/project/test/basic_tests.py", line 765, in test_run_mobilenet_multiproc
dtype, shape, data = con.execute_command('AI.TENSORGET', 'output', 'BLOB')
File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/redis/client.py", line 755, in execute_command
return self.parse_response(connection, command_name, **options)
File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/redis/client.py", line 768, in parse_response
response = connection.read_response()
File "/Users/distiller/Library/Python/3.7/lib/python/site-packages/redis/connection.py", line 638, in read_response
raise response
redis.exceptions.ResponseError: cannot get tensor from empty key
But once I then ssh'd into the machine and ran make -C opt test, all tests passed.
Do you have any clue? Because I don't really at the moment.
@lantiga and @rafie this should be solved by the PR #296 probably related to the master-replica sync and solved by https://github.com/RedisAI/RedisAI/blob/test.refactor/test/includes.py#L27