[DISCUSS] Upgrade Snappy to 1.2.2
Hi,
this thread is for discussing the upgrade to the latest version of Snappy (1.2.2).
I prepared the snappy upgrade here. To test this Snappy version in CouchDB, I provide an additional branch with PR #5567.
What I test so far:
- Install latest CouchDB (https://github.com/apache/couchdb/commit/91bd2edb6406ddd39d735548774a384b0ec3a46f)
- Create a "shared data" directory for the dev runs and use a preconfigured ini file
[vendor]
name = The Apache Software Foundation
[couchdb]
uuid = fake_uuid_for_dev
database_dir = /path/to/shared/dir/data
view_index_dir = /path/to/shared/dir/data
file_compression = snappy
[smoosh]
state_dir = /path/to/shared/dir/data
- Run this server with
./dev/run -aa:a -n1 -l snappy.ini - Create a db with sample documents, do some changes, deletions, create views, compaction, ...
- Shutdown CouchDB
- Clone another CouchDB #5567 with the PR from above
- Start this instance an try to load the database, views, documents
CouchDB should read the db and documents created by the older version of snappy.
Please test this and comment if this works for you.
My first impressions on this.
TLDR, without any modifications and with a simple benchmark script, it seems Snappy v1.2.2 is a little bit slower.
Hardware: iMac 11,2 (mid 2011) macOS Ventura (13.7.6)
For testing, I used datamaker and couchimport from @glynnbird.
First, I created a sample dataset with 100k documents with this template:
datamaker -t ./template.json -f json -i 100000 > sample100k.json
Afterwards I started the configured CouchDBs (with "old" and "new" Snappy) and ran a simple test script:
Test results:
Average Doc Size: ~1,2KB Doc Count: 100.000
| Old Snappy | New Snappy | |||
| # run | writing docs | reading docs | writing docs | reading docs |
| 1st run | 44.198s | 16.202s | 48.151s | 16.749s |
| 2nd run | 45.432s | 15.705s | 48.816s | 16.609s |
| 3rd run | 45.573s | 15.572s | 47.300s | 16.434s |
| mean | 45.068s | 15.826s | 48.089s | 16.597s |
Next test result: (larger doc size, sample size 10.000)
Average Doc Size: ~95,4 KB Doc Count: 10.000
| Old Snappy | New Snappy | |||
| # run | writing docs | reading docs | writing docs | reading docs |
| 1st run | 36.747s | 14.027s | 37.170s | 15.610s |
| 2nd run | 35.936s | 13.502s | 36.493s | 14.638s |
| 3rd run | 35.627s | 13.748s | 36.021s | 14.447s |
| mean | 36.103s | 13.759s | 36.561s | 14.898s |
Next test result: (larger doc size, sample size 5.000)
Average Doc Size: ~920,4 KB Doc Count: 5.000
| Old Snappy | New Snappy | |||
| # run | writing docs | reading docs | writing docs | reading docs |
| 1st run | 2m51.502s | 0m45.327s | 2m53.885s | 0m49.884s |
| 2nd run | 3m3.018s | 0m44.444s | 2m52.068s | 0m49.519s |
| 3rd run | 2m53.412s | 0m44.589s | 2m53.392s | 0m49.734s |
| mean | 2m55.977s | 0m44.787s | 2m53.115s | 0m49.712s |
Thanks for taking a look at it @big-r81!
That is a very interesting result, especially in the light of snappy since 1.0.9 advertising multiple perf improvements.
On my ubuntu / intel laptop running the built-in fabric_bench:go() benchmarks also shows a performance regression, especially for Get random doc part -- Out of 3 runs for main I get is 3000,3000,3000 (Hz). With the new snappy I get 2600,2600,2600 so fairly consistently lower.
Total runtime (short is better) is also lower on main, so the code is faster: 215,214,219 (sec) and with the new snappy 224, 222,223 (sec).
However running with #{doc_size => large} with doc bodies about 128KBs I see an improvement with new snappy:
Get random doc: 990, 980, 980 (Hz) on main) vs 1200,1200,1200 (Hz) new snappy