externalsortinginjava
externalsortinginjava copied to clipboard
StringSizeEstimator
do you consider the padding when you caculate the size of string ?the size of an object Is a multiple of 8
@j-joker
This seems to be a valid issue. I think we should round-up to the word size (64 bits on a 64-bit machine and 32 bits on a 32-bit machine).
Care to issue a PR? This way you'd get credit for the (small) change.
(I am repeating this comment from the PR to make sure it does not get lost:)
Please merge the following commit https://github.com/lemire/externalsortinginjava/commit/a5886f7e94b930b0cea260d26b41d412a28cc81c
and run mvn test
before and after your change. It will measure in a rough but sufficiently accurate manner the running time of the string estimation.
We want to make sure that we do not degrade the performance since this function is called repeatedly, possibly millions of times. It also does something that is relatively unimportant (produce a memory usage estimation) so we do not ever want it to have an impact on performance.
Here is what it might look like...
$ mvn test
(...)
Running com.google.code.externalsorting.ExternalSortTest
#ignore = 67412000
[performance] String size estimator uses 1.116796875 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.120703125 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.1216796875 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.116796875 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.1138671875 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.116796875 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.1197265625 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.116796875 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.112890625 ns per string
#ignore = 67412000
[performance] String size estimator uses 1.116796875 ns per string
This remains unresolved, we may underestimate the memory usage. Some analysis is needed.