okio icon indicating copy to clipboard operation
okio copied to clipboard

New APIs: ByteString.toIndex() and ByteString.toFraction()

Open swankjesse opened this issue 5 years ago • 2 comments

The first one may be useful with hashing to put byte strings in partitioning buckets for scaling. For example, to divide a dataset into 32 partitions, hash the key then use toIndex(32) to map the key to its partition.

The second one may be useful with dynamic experiments and A/B tests. For example, to assign a control group to 5% of customers hash the customer key then check if toFraction() is less than 0.05.

swankjesse avatar Jun 01 '20 23:06 swankjesse

Recording some out-of-band discussion points: we're looking for a word to replace "to" here since the index and fraction concepts are not intrinsic to domain of how you think of a bytestring. Jesse had a fun analogy where it's like having String.toIndex() which returned the absolute position of a string in a dictionary of all words. It's not perfect, of course, since strings can be arbitrary characters that aren't words. It is, however, a useful mental model since String.toIndex() would be a weird API but String.dictionaryIndex() makes sense. So what's the "dictionary" of the set of all possible bytes such that we can name these methods something like thingIndex() and thingFraction()?

JakeWharton avatar Jun 19 '20 02:06 JakeWharton

I've been thinking about the names here. My biggest concern with the current names is they could be interpreted as decoding ASCII, like readDecimalLong.

My new recommendation is to drop the index function, and rename toFraction() to unitIntervalDouble(). You can get the result of toIndex by dividing that by the target size.

swankjesse avatar Jun 20 '20 19:06 swankjesse