KVell icon indicating copy to clipboard operation
KVell copied to clipboard

strange behaviors in kvell

Open anoyiuhu opened this issue 4 years ago • 1 comments

Hi,

  1. According to scripts/run-aws.sh, we will run ycsb many times. The first time, kvell will generate a database(e.g 100g), then run ycsb workload. In the second time, kvell can reuse database in the last time and recover it. However, I found that sometimes, after recovery of database, it would stop suddenly, very confusing. Do you know why this happened?
Screen Shot 2020-09-03 at 10 34 48 AM
  1. During my test, using 2 disks, 4 workers per disk and setting queue depth to 1, I found the Latency and Bandwidth cannot be matched. For example, for ycsb-uniform, latency is 116us, thp is 409838(req/s). Theoretically, the ideal thp is equal to (1/116)(24)*10^6= 68965 req/s, which is smaller than 409838. Can you explain this phenomenon?

Best regards Looking for your reply.

anoyiuhu avatar Sep 03 '20 02:09 anoyiuhu

Hi,

  1. Never happened to me. If it happens again, you can maybe find some info using gdb? I'd be interested in the debug info.

  2. (Edited because I missed that you use a QD of 1)

    If I remember correctly, the latency is computed from the moment a query is inserted in a worker's queue. So there is some degree of "batching" even with a QD of 1 because the queue can contain multiple items. You can try to set MAX_NB_PENDING_CALLBACKS_PER_WORKER to 1. Make sure NEVER_EXCEED_QUEUE_DEPTH is equal to 1 too.

    Because the latency also includes the time "waiting in the queue", it complexifies maximum BW computation too. (Intuitively if a worker processes 1 request at a time + there is always 1 pending request in the queue, then your throughput is 2x what you would compute based on latency.)

    The latency is also computed on the first 10M queries (see MAX_STATS in stats.c), so the average might be wrong if your test is long.

    If you want to use the following formula: thp = avg latency * batch size * number of workers then modify the following line https://github.com/BLepers/KVell/blob/master/slabworker.c#L218 , replacing 2 by 0 (this will reset the latency measurement at that point in time).

BLepers avatar Sep 03 '20 05:09 BLepers