YCSB icon indicating copy to clipboard operation
YCSB copied to clipboard

Quick question about why use a byte array to populate the usertable collection.

Open morcelicaio opened this issue 2 years ago • 1 comments

Hello guys.

I had a question about the data that is generated for the database.

Why does YCSB use a Hashmap whose key value is a ByteIterator to fill in the data from the usertable collection in the DB? Why not use a String for example or another type of random value?

Is there any specific reason @busbey ?

morcelicaio avatar May 06 '22 11:05 morcelicaio

Here's the javadoc from the ByteIterator class, which gives a rationale:

  • YCSB-specific buffer class. ByteIterators are designed to support
  • efficient field generation, and to allow backend drivers that can stream
  • fields (instead of materializing them in RAM) to do so.
  • YCSB originially used String objects to represent field values. This led to
  • two performance issues.
  • First, it leads to unnecessary conversions between UTF-16 and UTF-8, both
  • during field generation, and when passing data to byte-based backend
  • drivers.
  • Second, Java strings are represented internally using UTF-16, and are
  • built by appending to a growable array type (StringBuilder or
  • StringBuffer), then calling a toString() method. This leads to a 4x memory
  • overhead as field values are being built, which prevented YCSB from
  • driving large object stores.

Mind you, this is from 10+ years ago.

busbey avatar May 06 '22 14:05 busbey