xjb714
xjb714
English: I use icpx 2025.0.4 -O3 to compile my code. clang++ seems to be a little less efficient than icpx, and g++ seems to be even less efficient. Generally speaking,...
@ibireme Your f64_bin_to_dec function code can be optimized in this way. Testing on my CPU can probably reduce **3 to 4** cycle. 
@ibireme Thank you for your information. Your algorithm does perform better on most hardware and does not depend on a specific instruction set. Your algorithm is really ingenious, and I...
@ibireme Performance optimization of yy_double_to_string. The source code link is as follows. [yy_double copy.txt](https://github.com/user-attachments/files/19147418/yy_double.copy.txt) Improve the performance under random length test.
@ibireme There is a mistake in the above code. Use this file. [yy_double copy.txt](https://github.com/user-attachments/files/19147701/yy_double.copy.txt)
@ibireme. It seems that there is no problem to use the following code instead, and calculate the number of trailing zeros in the lower 16 digits. num8_1 ``` u64 tz1...
@ibireme This code achieves the same function and the compiled result has no branch instructions. The penalty of branch prediction failure can be avoid. click this link: https://godbolt.org/z/fEa5oGcjo
It seems that the performance can be improved by using a larger lookup table, which requires **4KB** to store ASCII codes from ‘000’ to ‘999’ and **4.94KB** to store exp_dec...
@ibireme Code error modification. Location: the **write_u64_len_1_to_16** function before: ``` u64 e10_tmp = e10_tmp_table[__builtin_clzll(val)]; ``` after: ``` u64 e10_tmp = e10_tmp_table[__builtin_clzll(val|1)]; ```
@ibireme The lookup table of pow10 is about 10KB, which is necessary, but in practical application, only 1KB or even less of 10KB may be used, and the decimal exponents...