YCSB
YCSB copied to clipboard
Quick question about why use a byte array to populate the usertable collection.
Hello guys.
I had a question about the data that is generated for the database.
Why does YCSB use a Hashmap whose key value is a ByteIterator to fill in the data from the usertable collection in the DB? Why not use a String for example or another type of random value?
Is there any specific reason @busbey ?
Here's the javadoc from the ByteIterator class, which gives a rationale:
- YCSB-specific buffer class. ByteIterators are designed to support
- efficient field generation, and to allow backend drivers that can stream
- fields (instead of materializing them in RAM) to do so.
- YCSB originially used String objects to represent field values. This led to
- two performance issues.
- First, it leads to unnecessary conversions between UTF-16 and UTF-8, both
- during field generation, and when passing data to byte-based backend
- drivers.
- Second, Java strings are represented internally using UTF-16, and are
- built by appending to a growable array type (StringBuilder or
- StringBuffer), then calling a toString() method. This leads to a 4x memory
- overhead as field values are being built, which prevented YCSB from
- driving large object stores.
Mind you, this is from 10+ years ago.