couchdb [DISCUSS] Upgrade Snappy to 1.2.2

Hi,

this thread is for discussing the upgrade to the latest version of Snappy (1.2.2).

I prepared the snappy upgrade here. To test this Snappy version in CouchDB, I provide an additional branch with PR #5567.

What I test so far:

Install latest CouchDB (https://github.com/apache/couchdb/commit/91bd2edb6406ddd39d735548774a384b0ec3a46f)
Create a "shared data" directory for the dev runs and use a preconfigured ini file

[vendor]
name = The Apache Software Foundation

[couchdb]
uuid = fake_uuid_for_dev
database_dir = /path/to/shared/dir/data
view_index_dir = /path/to/shared/dir/data
file_compression = snappy

[smoosh]
state_dir = /path/to/shared/dir/data

Run this server with ./dev/run -aa:a -n1 -l snappy.ini
Create a db with sample documents, do some changes, deletions, create views, compaction, ...
Shutdown CouchDB
Clone another CouchDB #5567 with the PR from above
Start this instance an try to load the database, views, documents

CouchDB should read the db and documents created by the older version of snappy.

Please test this and comment if this works for you.

Jun 15 '25 16:06 big-r81

My first impressions on this.

TLDR, without any modifications and with a simple benchmark script, it seems Snappy v1.2.2 is a little bit slower.

Hardware: iMac 11,2 (mid 2011) macOS Ventura (13.7.6)

For testing, I used datamaker and couchimport from @glynnbird.

First, I created a sample dataset with 100k documents with this template:

datamaker -t ./template.json -f json -i 100000 > sample100k.json

Afterwards I started the configured CouchDBs (with "old" and "new" Snappy) and ran a simple test script:

Test results:

Average Doc Size: ~1,2KB Doc Count: 100.000

	Old Snappy		New Snappy
# run	writing docs	reading docs	writing docs	reading docs
1st run	44.198s	16.202s	48.151s	16.749s
2nd run	45.432s	15.705s	48.816s	16.609s
3rd run	45.573s	15.572s	47.300s	16.434s
mean	45.068s	15.826s	48.089s	16.597s

Jun 17 '25 06:06 big-r81

Next test result: (larger doc size, sample size 10.000)

Average Doc Size: ~95,4 KB Doc Count: 10.000

	Old Snappy		New Snappy
# run	writing docs	reading docs	writing docs	reading docs
1st run	36.747s	14.027s	37.170s	15.610s
2nd run	35.936s	13.502s	36.493s	14.638s
3rd run	35.627s	13.748s	36.021s	14.447s
mean	36.103s	13.759s	36.561s	14.898s

Jun 17 '25 09:06 big-r81

Next test result: (larger doc size, sample size 5.000)

Average Doc Size: ~920,4 KB Doc Count: 5.000

	Old Snappy		New Snappy
# run	writing docs	reading docs	writing docs	reading docs
1st run	2m51.502s	0m45.327s	2m53.885s	0m49.884s
2nd run	3m3.018s	0m44.444s	2m52.068s	0m49.519s
3rd run	2m53.412s	0m44.589s	2m53.392s	0m49.734s
mean	2m55.977s	0m44.787s	2m53.115s	0m49.712s

Jun 17 '25 12:06 big-r81

Thanks for taking a look at it @big-r81!

That is a very interesting result, especially in the light of snappy since 1.0.9 advertising multiple perf improvements.

Jun 19 '25 16:06 nickva

On my ubuntu / intel laptop running the built-in fabric_bench:go() benchmarks also shows a performance regression, especially for Get random doc part -- Out of 3 runs for main I get is 3000,3000,3000 (Hz). With the new snappy I get 2600,2600,2600 so fairly consistently lower.

Total runtime (short is better) is also lower on main, so the code is faster: 215,214,219 (sec) and with the new snappy 224, 222,223 (sec).

Jun 20 '25 18:06 nickva

However running with #{doc_size => large} with doc bodies about 128KBs I see an improvement with new snappy:

Get random doc: 990, 980, 980 (Hz) on main) vs 1200,1200,1200 (Hz) new snappy

Jun 20 '25 19:06 nickva