Basic benchmarks show that the point-in-polygon API takes between 0 & 1 millisecond to execute.

We don't fully understand what the performance is like:

under heavy load
on a cold start vs. when the Linux filesystem cache has paged all/most of the DB
single core vs. multi-core
when it hits the max QPS for a machine
with a small DB vs a large DB
at various levels of 'max shard complexity' (a tunable config value).

This ticket is to figure out how to generate benchmarks which return more than simply vanity metrics.

It would be ideal if we can automate this process to measure performance over time, as new features are added.

Jul 11 '19 10:07 missinglink

Hi there, I did some stress test for spatial so that we can have an idea on the performances.

I used Gatling and the pip-service scenario. The service and injector are on different machines.

Spec

Service:

OS: Debian
CPU: 4 thread (2.3ghz)
RAM: 60Go
Containers: pelias/spatial:master and pelias/pip-service:master
Database: Admin France extract from Geocode Earth

Injector:

OS: Debian
CPU: 8 thread (2.3ghz)
RAM: 30Go
Containers: jawg/pelias-server-streess

Scenario

We use a set of regions, get a random point in the region and do a PIP request on the endpoint /query/pip/_view/pelias/:lon/:lat for spatial or /:lon/:lat for pip-service. The seeds were generated only once to have the same scenario each time.

In this scenario, we have a total of 75,000 users arriving in 60 seconds. Each user makes a unique request. The goal is a 95th percentile below 750ms. Gatling will inject 1,250 req/s Regions:

AUVERGNE-RHONE-ALPES,44.1154,46.804,2.0629,7.1859
BOURGOGNE-FRANCHE-COMTE,46.1559,48.4001,2.8452,7.1435
BRETAGNE,47.278,48.9008,-5.1413,-1.0158
CENTRE-VAL DE LOIRE,46.3471,48.9411,0.053,3.1286
CORSE,41.3336,43.0277,8.5347,9.56
GRAND EST,47.4202,50.1692,3.3833,8.2333
HAUTS-DE-FRANCE,48.8372,51.089,1.3797,4.2557
ILE-DE-FRANCE,48.1205,49.2413,1.4465,3.5587
NORMANDIE,48.1799,50.0722,-1.9485,1.8027
NOUVELLE-AQUITAINE,42.7775,47.1758,-1.7909,2.6116
OCCITANIE,42.3331,45.0467,-0.3272,4.8456
PAYS DE LA LOIRE,46.2664,48.568,-2.6245,0.9167
PROVENCE-ALPES-COTE D'AZUR,42.9818,45.1268,4.2303,7.7188

Results

I launched the scenario 3 times.

cold start on spatial service, without Linux file system cache. View online or 75k-spatial-without-cache.pdf
hot start on spatial service, with Linux file system cache. View online or 75k-spatial-with-cache.pdf
on pip-service. View online or 75k-pip-service.pdf

Conclusion

Without Linux cache, spatial can't handle this scenario, the number of user is growing until the end (815.217 req/s). With the CPU chart, we see the bottleneck : iowait. But with the cache, the 95th percentile is at 869ms >~ 750ms (1209.677 req/s). Without surprise, pip-service is blazing fast with its 95th percentile at 594 ms (1229.508 req/s).

Since we can't control the Linux cache, I can say that 800 req/s is the first limit for spatial.

Next tests will be without multi-core and with fewer users.

Jul 21 '20 10:07 Joxit

Nice benchmarks :+1:, I had a quick look at the query generation for PIP and there are definitely some 'quick wins' to reduce latency.

Recently I added https://github.com/pelias/spatial/pull/65 which can probably mean we can delete a bunch of query logic for finding the default names.

There's probably other things which can be improved too.

If possible could you please keep your benchmarking scripts around so we can run a comparison once this feature lands?

Jul 21 '20 10:07 missinglink

I did some similar benchmarking in the past and found that reducing the number of users greatly improved the performance, I think in a real-world scenario we're going to be having <5 'users' connected (ie. open HTTP streams).

I'd be interested to see what difference it makes to reduce the user count, assuming that Connection: keep-alive is used?

Jul 21 '20 11:07 missinglink

One of the really nice things about using SQLite is that it's so easy (and cheap!) to scale this compared to something which is memory-bound.

So I'm more interested in throughput than latency, although we should still make it run as efficiently as possible 😄

If we can run several high-CPU instances (or threads) of this service it'll be capable of PIP-ing many thousands per-second and can theoretically scale linearly as more servers are added.

One interesting thing to note is that the mmap filesystem cache is shared between all processes on the same machine, so a 64-core machine would be able to do 64x this benchmark while only requiring one copy of the disk pages in RAM.

And! (and this is the interesting bit) this is also true of Docker, so you can run multiple containers/pods on the same physical machine using mmap and they will also share the same filesystem cache from the host machine 🧙‍♂️

Jul 21 '20 11:07 missinglink

Okay so https://github.com/pelias/spatial/pull/67 should hopefully improve these numbers! ~10x? 🤞

Jul 21 '20 12:07 missinglink

If possible could you please keep your benchmarking scripts around so we can run a comparison once this feature lands?

Okay :+1: I wrote all the info I need in my comment if I need to redo the same benchmark :smile: For results the pdf will still be present, I should remove the online version when we release spatial or close this issue.

I did some similar benchmarking in the past and found that reducing the number of users greatly improved the performance, I think in a real-world scenario we're going to be having <5 'users' connected (ie. open HTTP streams).

Yes, for me, what we should target is at least 500 req/s for a 95th percentile at 750ms without Linux cache and I think it is possible. :rainbow: IDK if Gatling's .shareConnections uses Keep-Alive or not :thinking: Let's try with #67 now !

Jul 21 '20 12:07 Joxit

Any reason you are testing with the Linux cache (mmap mode) disabled? I was assuming we would always leave that on since it prevents a lot of I/O.

Jul 21 '20 12:07 missinglink

FYI https://github.com/pelias/interpolation/pull/243

Jul 21 '20 12:07 missinglink

The Linux cache is not disable, I flush the cache before the stress test to simulate a cold start. In reality, it won't happen every day, but it gives me a worst case scenario (after machine reboot/database update). That's why I run the stress test twice :smile:

Jul 21 '20 12:07 Joxit

Guess what ?

Jul 21 '20 13:07 Joxit

A little suspense ...

So, new benchmark with #67 with and without Linux cache. Same scenario as before.

Results

First run just after a linux cache flush. View online or 75k-spatial-without-cache-67.pdf
Second run after the first one. View online or 75k-spatial-with-cache-67.pdf

With a cache flush, the 95th percentile is at 43,889ms without timeout which is better ! And with Linux cache... The 95th percentile is at 31ms which is better that pip-service :scream: The CPU is OK so we can increase the number of requests... But we already have 1,229.508 req/s which I think is more than correct !

Jul 21 '20 13:07 Joxit

BOOM 💥

Jul 21 '20 13:07 missinglink

Dang, that's some great performance. I guess we need to get serious about integrating it into Pelias :)

Jul 21 '20 13:07 orangejulius

Yeah I'm really happy with that because I put a lot of faith in this architecture and it's nice to know it's bearing fruit. One other thing which recently happened, which worked in our favour was https://github.com/JoshuaWise/better-sqlite3/commit/758665a3db2df0e4b94af795df2f5dfc8e78f12d#diff-6f4c547489674c10529650f5632f129f which changed the threading mode in better-sqlite3, IIRC before that multi-core was actually making it slightly slower and now it's hopefully working as expected.

Jul 21 '20 13:07 missinglink

Dang, that's some great performance. I guess we need to get serious about integrating it into Pelias :)

:+1: ! And now we have proof that this project performs better than the current stack. Which should please our customers.

Obama Mic Drop

Jul 21 '20 13:07 Joxit

I just ran k6 for a comparison from another load-testing util on my dev server (16 threads @3.6Ghz) and it flew through it:

This is actually not a great test since it used the same lat/lon for each request.

k6 run --vus 20 --iterations 100000 test.js

iteration_duration.........: avg=5.59ms  min=1.41ms  med=4.4ms   max=42.88ms  p(90)=8.61ms  p(95)=10.04ms
    iterations.................: 100000 3565.959785/s

$ cat test.js

import http from 'k6/http';
const url = 'http://localhost:3000/query/pip/_view/pelias/174.766843/-41.288788'

export default function() {
  http.get(url);
}

Jul 21 '20 13:07 missinglink

I'm using 174.766843/-41.288788 (a location in New Zealand) since the NZ country polygon is large and complex, it's a good 'worst-case scenario' for PIP 😆

Jul 21 '20 13:07 missinglink

spatial
spatial copied to clipboard

Performance testing point-in-polygon

Spec

Scenario

Results

Conclusion

Results

spatial spatial copied to clipboard

Performance testing point-in-polygon

Spec

Scenario

Results

Conclusion

Results

spatial
spatial copied to clipboard