cbft icon indicating copy to clipboard operation
cbft copied to clipboard

pIndex file generated on disk are close to 10x size of the bucket size in memory

Open abhi-bit opened this issue 10 years ago • 6 comments

➜  data  du -sch webnutshell_21809c13557138e3_* | grep total
592M    total
➜  data  /opt/couchbase/bin/cbstats 0:11210 -b webnutshell all | grep -w
mem_used
 mem_used:                           66951456

3 types of documents in this bucket(webnutshell), redacted some confidential customer info:

cluster_blob: https://gist.github.com/abhi-bit/6bbbcac3ff75d20b0e00 node_blob: https://gist.github.com/abhi-bit/a8892159fd684c510fb6 customer_blob: https://gist.github.com/abhi-bit/62882eae79602bcda77c

Also, I've noticed the ratio of cbft index files vs bucket mem_used to grow as bucket dataset size grows. From an earlier deployment experience, I've seen a bucket using ~1G in memory created indexes of size 190GB on disk - I've kick started indexing against that bucket couple of days back, will share numbers once the indexing is complete there.

abhi-bit avatar Aug 06 '15 04:08 abhi-bit

After I flipped to using goleveldb, index size on disk has dropped to 2x - 3x bucket mem_used. Also with goleveldb indexing is very noticeably faster compared to default boltdb option. It might make sense to have goleveldb as default kvstore(Note: I haven't tested anything else beside boltdb)

abhi-bit avatar Aug 06 '15 08:08 abhi-bit

OK, I don't see any arrays. How big is the index? Is it possible to share it with me somehow?

mschoch avatar Aug 06 '15 17:08 mschoch

BoltDB based indexes were 592MB in size and levelDB based are 134MB(bucket me_used 64MB). You're asking for raw index files from disk or bucket data?

abhi-bit avatar Aug 06 '15 17:08 abhi-bit

Well, with the Bleve index I should be able to reproduce the error and figure out which field in which document it was trying to highlight. I understand it contains some customer sensitive data, so if there is some secure way for me to download it that would be ideal.

mschoch avatar Aug 06 '15 17:08 mschoch

Passed details over mail

abhi-bit avatar Aug 06 '15 17:08 abhi-bit

Figured this might a good bug to cross-link as it has some (admittedly old) advice: https://github.com/couchbaselabs/cbft/issues/11

steveyen avatar Aug 12 '15 04:08 steveyen