NeuroVault Create a benchmark for image search

Features:

[ ] time it takes for a new map to be indexed
[ ] time it takes to retrieve search results
[ ] vary the size of the database for those operations to be able to extrapolate (using linear or exponential fit) performance when the database will grow in the future

Nov 05 '15 00:11 chrisgorgo

You can use this snapshot for data: https://purl.stanford.edu/qd938sk1220

May 19 '16 15:05 chrisgorgo

I've done a benchmark in my local for the actual implementation. The time would vary from 1 machine to other... I decided to built it using the Command framework because the Test way was a bit slower, and because it does not make a lot of sense to me to write a benchmark test (if it is not a small one). Anyway, I did also the Test way in case you want to take a look :)

The limit was the 5015 unthresholded maps that are actually part of Neurovault. I did 2 benchmarks, the first one was uploading one by one all the images and calculating the index, this was taking forever, so I decided to just upload all of them and benchmark every 500 images (for curiosity, this was the 2nd benchmark, with the server busy). I did the first benchmark again with the server not busy and all the 5015 images loaded, it takes 77 seconds to make the 5014 comparisons. Plus, there where like 200.000 Comparison Objects created when the db was just with about 650 images (that makes 12,5 million objects for 5000 images). I don't know how much space does an object use, but the small transformations take like 225KB each, which is around 1GB for 5000 images.

bench_neurovault

What do you think? I'm going to do now the searching one

PD: With 5000 images in the db, Redis is consuming all my memory, so I had to delete it from the docker-compose. Is this normal?

May 31 '16 10:05 erramuzpe

In asynchronous mode Redis stores information about Tasks (such as creating glass brain or calculating pairwise comparisons). Workers query Redis for available tasks and perform them in the background. The fact that Redis filled up with tasks and you had to turn it off makes me worried that something went wrong. To avoid using Redis and force the benchmark to work in synchronous mode you can use CELARY_ALWAYS_EAGER mode (like we do for tests: https://github.com/NeuroVault/NeuroVault/blob/4e4f363fdefa0656ae06531bbbf318a253265aa0/neurovault/settings.py#L354).

I agree that Command is a better option than Test. Could you submit a Work In Progress pull request with your benchmark?

May 31 '16 14:05 chrisgorgo

Ah, cool, thanks for the tip. :+1: I will do the PR: benchmark2 (server not busy) and benchmark3 (busy, but hopefully not anymore with this new tip) are the ones I'm using.

May 31 '16 16:05 erramuzpe