1brc icon indicating copy to clipboard operation
1brc copied to clipboard

First attempt from hundredwatt

Open hundredwatt opened this issue 1 year ago • 2 comments

Check List:

  • [x] Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
  • [x] All formatting changes by the build are committed
  • [x] Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
  • [x] Output matches that of calculate_average_baseline.sh
  • Execution time: 00:05.318 (8 Core Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz)
  • Execution time of reference implementation: 3:18.572

Was distracted by making the eval script perfect 😂, so didn't spend as much time on this as I wanted, but excited to join the mix!

Main thing:

  • Gunnar said we couldn't use perfect hash tables for city names early on, but no one said anything about using them for temperature values 😉 . With only 1,999 possible values, we can construct a naive perfect hash table (which has terrible construction time, but the best possible lookup time) for temperature values that's only 5KB.

Also:

  • Bit twiddling to find separators and mask words
  • Read from file only using longs (MBB.getLong())
  • Reading the file in stripes instead of partitions - implemented this, but didn't analyze yet... planning to play with this more
  • Using 21.0.1-graal because it's all over the top of the leaderboard 😄

hundredwatt avatar Jan 10 '24 05:01 hundredwatt

Re-ran evaluation on an 8-core cloud VM:

  • Execution time: 00:05.318 (8 Core Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz)
  • Execution time of reference implementation: 3:18.572

hundredwatt avatar Jan 10 '24 17:01 hundredwatt

It's getting late here and I'm tired, so sorry for the questions: how/where does it deal with collisions for the name hash, and what is the purpose of that value hash table? Thx for clarifying.

gunnarmorling avatar Jan 10 '24 21:01 gunnarmorling

@gunnarmorling

For city names, I use linear probing similar to other entires:

https://github.com/gunnarmorling/1brc/blob/8bb24ec4f631a635f11f91f26c63246e14bb79f9/src/main/java/dev/morling/onebrc/CalculateAverage_hundredwatt.java#L163-L168

For temperature values, I realized there's only 1,999 possible values based on the rules, so I created a lookup table that uses a perfect hash function to map the raw bytes from the file to the corresponding temperature value integer.

I used a naive perfect hash function of the form temperature_long * SEED % slots with SEED chosen offline via brute force such that there are no collisions amongst the possible temperature values.

Here are some examples of parsing temperature values using the lookup table:

File Contents mbb.getLong() Bit Twiddling to remove '\n' and next line characters perfect hash function to calculate lookup table location Math.abs(temperature_long * seed) % slots Read value at hash index from TEMPERATURES array
"-67.8\nHa" 0x2d36372e380a4861 0x2d36372e38000000 2415 -678
"31.4\nCha" 0x33312e340a436861 0x33312e3400000000 3823 314

This method parses the temperature value with fewer instructions than other methods and no branching. It does require a memory lookup, but since the lookup table is 10KB, it should always reside in L1 or L2 cache.

Here are the important code snippets:

Compute all possible temperature strings and encode them as long words

https://github.com/gunnarmorling/1brc/blob/8bb24ec4f631a635f11f91f26c63246e14bb79f9/src/main/java/dev/morling/onebrc/CalculateAverage_hundredwatt.java#L44-L75

Construct the lookup table mapping all temperature strings as input to the parsed integer value

https://github.com/gunnarmorling/1brc/blob/8bb24ec4f631a635f11f91f26c63246e14bb79f9/src/main/java/dev/morling/onebrc/CalculateAverage_hundredwatt.java#L102-L109

Read temperature value long word from file, bit twiddle to remove \n and trailing characters, then use lookup table to find integer value

https://github.com/gunnarmorling/1brc/blob/8bb24ec4f631a635f11f91f26c63246e14bb79f9/src/main/java/dev/morling/onebrc/CalculateAverage_hundredwatt.java#L264-L277

hundredwatt avatar Jan 11 '24 04:01 hundredwatt

@hundredwatt, thanks a lot for this extensive explanation 🙏! Really, really cool stuff. Comes in at 00:04.418 on the new machine, i.e. very close to the top. Nice one!

gunnarmorling avatar Jan 11 '24 10:01 gunnarmorling