dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Invalid command inside multi cause a crash

Open boazsade opened this issue 2 years ago • 1 comments

Describe the bug This issue is related to locking. When we running a pipeline of commands (multi..exec sequence with the CLI), there is not why to do a "deep" test for the command validation, the only validation is that there is at least the correct minimum number of args. When the command passes this initial test it would be saved to the list of the commands that we would later execute inside "dfly_cntx->conn_state.exec_info.body.push_back". Later when the execute arrive, it would start executing these commands. The execution will invoke each of these using the call to "scmd.descr->Invoke(cmd_arg_list, cntx);" Note that this call do not return a result. The last part is to release any lock that was taken during this processing. So the call is made here "cntx->transaction->UnlockMulti();" This assumes that each command was executed successfully. And so it passes on every key that was processes during the steps above (see transaction.cc UnlockMulti), the releases the locks. The problem that this trigger is that if one of the commands failed because it was invalid args (either the type of the args or some other invalid value), it would never acquire a lock, and so the lock release function "UnlockMultiShardCb" that is calling DbTable::Release, will fail because the command that was not actually executed never actually got a lock, and the release function is not expecting inconsistency. One option for this is to assume that we may trying to release locks that never happened. This has the potential to not releasing locks, or to hide some other issue. Other option is to try and make sure that we have knowledge about which keys were actually locked and not to try to release those keys that never actually took a lock.

To Reproduce Steps to reproduce the behavior:

  1. MULTI
  2. SET foo bar EX EX PX oiuqwer
  3. EXEC
  4. See error

Expected behavior This should not crash it should not report an error about the invalid argument to the set command.

Environment (please complete the following information):

  • OS: ubuntu 20.04
  • Kernel: Linux dfly2 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Bare Metal
  • Dragonfly Version: 0.10.0

Reproducible Code Snippet

#!/usr/bin/env python3

import asyncio
import aioredis
import async_timeout

DB_INDEX = 1


async def run_pipeline_mode(pool, messages):
    try:
        conn = aioredis.Redis(connection_pool=pool)
        pipe = conn.pipeline()
        for key, val in messages.items():
            pipe.set(key, val)
   #     pipe.get(key)
        pipe.set("foo", "bar", "EX", "oiuqwer") 
        result = await pipe.execute()

        print(f"got result from the pipeline of {result} with len = {len(result)}")
        if len(result) != len(messages):
            return False, f"number of results from pipe {len(result)} != expected {len(messages)}"
        elif False in result:
            return False, "expecting to successfully get all result good, but some failed"
        else:
            return True, "all command processed successfully"
    except Exception as e:
        print(f"failed to run command: error: {e}")
        return False, str(e)


def test_pipeline_support():
    def generate(max):
        for i in range(max):
            yield f"key{i}", f"value={i}"

    messages = {a: b for a, b in generate(1)}
    loop = asyncio.new_event_loop()
    async_pool = aioredis.ConnectionPool(host="localhost", port=6379,
                                         db=DB_INDEX, decode_responses=True, max_connections=16)
    success, message = loop.run_until_complete(
        run_pipeline_mode(async_pool, messages))
    #assert success, message
    return success


if __name__ == "__main__":
    print("starting the test")
    state = test_pipeline_support()
    print(f"finish successfully - {state}")

boazsade avatar Nov 08 '22 14:11 boazsade

There is new branch for this issue issue-468 was created. Please don't sync this with the main branch as we would like to make sure that we can replication this. You have a unit test call "MultiIssue468" under dragonfly_test.cc that will be able to trigger this issue, There is a script call "fail_multi_invalid_command.py" under tests/integration that can trigger this issue as well.

boazsade avatar Nov 08 '22 14:11 boazsade