scylla-tools-java icon indicating copy to clipboard operation
scylla-tools-java copied to clipboard

sstablemetadata is extremely slow

Open dyasny opened this issue 5 years ago • 1 comments

I am scanning a moderate number of sstable files from a backup (~3500 files with every 9th file sampled) using two methods. One is a direct reader of the binary data written in Python, and the other is sstablemetadata.

Scanning the entire set takes ~45 minutes when using sstablemetadata, and ~5 seconds when using the python script.

The script only looks for and returns the token ranges of course, but this is still a huge difference.

dyasny avatar Jun 16 '20 16:06 dyasny

Doesn't sstablemetadata do tons of work, like tombstone calculation: https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTableMetadata.html

You can suggest an enhancement to add a flag where it will just emit the ranges

On Tue, Jun 16, 2020 at 9:56 AM Dan Yasny [email protected] wrote:

I am scanning a moderate number of sstable files from a backup (~3500 files with every 9th file sampled) using two methods. One is a direct reader of the binary data written in Python, and the other is sstablemetadata.

Scanning the entire set takes ~45 minutes when using sstablemetadata, and ~5 seconds when using the python script.

The script only looks for and returns the token ranges of course, but this is still a huge difference.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-tools-java/issues/174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHURNQ6YQ2XRDISPHOQMLRW6P5TANCNFSM4N7ZVXLQ .

dorlaor avatar Jun 16 '20 17:06 dorlaor