project-m36 Performance of executeDatabaseContextExpr

For context, I'm currently assessing whether I can use project-m36 as the persistence layer for a web application (I'm hoping that I will be able to!).

I'm currently struggling with the performance of executeDatabaseContextExpr. My test case runs using the NoPersistence strategy (so nothing is being written to the file system or anything).

My test first inserts 100k records in one big batch and then tries inserting records one by one. The simplest case defines no constraints on the data at all. The second case defines just a single unique key (inclusion dependency) for the relvar.

No constraints single insert: ~1s Single unique key inclusion dependency: ~7s

My target response time to POST requests for a web app would be something like 200ms total, so at the moment, even without any constraints, and only a single relvar, the performance is an order of magnitude away from usability. If scaled up to the context of a full application model (say 20 relvars, each with 10k-1m records, with multiple unique and foreign key constraints) then presumably I'd be looking at insert times of around 60-300s.

I understand that a new persistence layer with constant time transaction commits is in the works, which is definitely necessary, but if anything executeDatabaseContextExpr seems to be a much bigger performance problem.

Jun 18 '18 15:06 matchwood

Excellent info! Would you be able to perform the performance analysis with the GHC profiler?

Currently, our constraints are quite naively implemented so each constraint check likely costs a relation scan. I suspect there is much low-hanging fruit to be found in optimizations, but there may be even lower-hanging fruit in protocol overhead. Are you using the websocket or native Haskell interface?

Jun 20 '18 15:06 agentm

I will try to do that - is there any particular settings you'd be looking for with the profiler?

I'm actually less concerned about the constraint checking (though that is obviously something that needs to be addressed) than in the no constraint case. For a pure memory insert I was expecting something more in the range of 1ms.

By the way, is this in line with what you would expect, performance wise? It may be the case that I am doing something wrong in the way I am inserting.

Jun 26 '18 23:06 matchwood

If you provide a sample driver script or snippet, I'll take a look. I would also like to include such a script in our performance test suite to catch regressions as well.

There is probably low-hanging fruit to be found in strictness annotations, so this issue might be quick to fix.

Jun 27 '18 02:06 agentm

Apologies for the delay. I've put up a very simple benchmark here https://github.com/matchwood/project-m36-typed/blob/develop/benchmark/Benchmark.hs . I currently get a benchmarked time of 300ms (I have switched to a more powerful machine since running my last benchmark) per insert with that code - one relvar, no constraints and no persistence. To use Project M36 in production I'd probably be looking for inserts, with safe persistence, multiple constraints and 50 plus relvars, of around 1ms max.

Aug 07 '18 13:08 matchwood

@matchwood I ran your benchmark with the patch applied (that I just sent) and I get

benchmarking insertRecord
time                 7.068 ms   (5.479 ms .. 8.400 ms)
                     0.840 R²   (0.791 R² .. 0.898 R²)
mean                 3.796 ms   (3.066 ms .. 4.762 ms)
std dev              2.105 ms   (1.621 ms .. 2.785 ms)
variance introduced by outliers: 98% (severely inflated)

Benchmark project-m36-benchmark: FINISH

Have I measured correctly? Is it the 8.4 ms you'd like to get down below 1 ms, assuming it was changed to have persistence, more relvars and constraints?

Jan 20 '21 03:01 ysangkok

I am also working on improving these performance numbers and am integrating matchwood's benchmarks into Project:M36 proper so that we can track these numbers over time. There is definitely low-hanging performance fruit to find and address.

Jan 23 '21 16:01 agentm