couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

[DISCUSS] Upgrade Snappy to 1.2.2

Open big-r81 opened this issue 6 months ago • 6 comments

Hi,

this thread is for discussing the upgrade to the latest version of Snappy (1.2.2).

I prepared the snappy upgrade here. To test this Snappy version in CouchDB, I provide an additional branch with PR #5567.

What I test so far:

  1. Install latest CouchDB (https://github.com/apache/couchdb/commit/91bd2edb6406ddd39d735548774a384b0ec3a46f)
  2. Create a "shared data" directory for the dev runs and use a preconfigured ini file
[vendor]
name = The Apache Software Foundation

[couchdb]
uuid = fake_uuid_for_dev
database_dir = /path/to/shared/dir/data
view_index_dir = /path/to/shared/dir/data
file_compression = snappy

[smoosh]
state_dir = /path/to/shared/dir/data
  1. Run this server with ./dev/run -aa:a -n1 -l snappy.ini
  2. Create a db with sample documents, do some changes, deletions, create views, compaction, ...
  3. Shutdown CouchDB
  4. Clone another CouchDB #5567 with the PR from above
  5. Start this instance an try to load the database, views, documents

CouchDB should read the db and documents created by the older version of snappy.

Please test this and comment if this works for you.

big-r81 avatar Jun 15 '25 16:06 big-r81

My first impressions on this.

TLDR, without any modifications and with a simple benchmark script, it seems Snappy v1.2.2 is a little bit slower.

Hardware: iMac 11,2 (mid 2011) macOS Ventura (13.7.6)

For testing, I used datamaker and couchimport from @glynnbird.

First, I created a sample dataset with 100k documents with this template:

datamaker -t ./template.json -f json -i 100000 > sample100k.json

Afterwards I started the configured CouchDBs (with "old" and "new" Snappy) and ran a simple test script:

Test results:

Average Doc Size: ~1,2KB Doc Count: 100.000

Old Snappy New Snappy
# run writing docs reading docs writing docs reading docs
1st run 44.198s 16.202s 48.151s 16.749s
2nd run 45.432s 15.705s 48.816s 16.609s
3rd run 45.573s 15.572s 47.300s 16.434s
mean 45.068s 15.826s 48.089s 16.597s

big-r81 avatar Jun 17 '25 06:06 big-r81

Next test result: (larger doc size, sample size 10.000)

Average Doc Size: ~95,4 KB Doc Count: 10.000

Old Snappy New Snappy
# run writing docs reading docs writing docs reading docs
1st run 36.747s 14.027s 37.170s 15.610s
2nd run 35.936s 13.502s 36.493s 14.638s
3rd run 35.627s 13.748s 36.021s 14.447s
mean 36.103s 13.759s 36.561s 14.898s

big-r81 avatar Jun 17 '25 09:06 big-r81

Next test result: (larger doc size, sample size 5.000)

Average Doc Size: ~920,4 KB Doc Count: 5.000

Old Snappy New Snappy
# run writing docs reading docs writing docs reading docs
1st run 2m51.502s 0m45.327s 2m53.885s 0m49.884s
2nd run 3m3.018s 0m44.444s 2m52.068s 0m49.519s
3rd run 2m53.412s 0m44.589s 2m53.392s 0m49.734s
mean 2m55.977s 0m44.787s 2m53.115s 0m49.712s

big-r81 avatar Jun 17 '25 12:06 big-r81

Thanks for taking a look at it @big-r81!

That is a very interesting result, especially in the light of snappy since 1.0.9 advertising multiple perf improvements.

nickva avatar Jun 19 '25 16:06 nickva

On my ubuntu / intel laptop running the built-in fabric_bench:go() benchmarks also shows a performance regression, especially for Get random doc part -- Out of 3 runs for main I get is 3000,3000,3000 (Hz). With the new snappy I get 2600,2600,2600 so fairly consistently lower.

Total runtime (short is better) is also lower on main, so the code is faster: 215,214,219 (sec) and with the new snappy 224, 222,223 (sec).

nickva avatar Jun 20 '25 18:06 nickva

However running with #{doc_size => large} with doc bodies about 128KBs I see an improvement with new snappy:

Get random doc: 990, 980, 980 (Hz) on main) vs 1200,1200,1200 (Hz) new snappy

nickva avatar Jun 20 '25 19:06 nickva