sys/fmt: optimize scn_u32_hex()
Contribution description
We can save a branch by always converting to lowercase first.
Testing procedure
tests-fmt in tests/unittests covers this.
Issues/PRs references
Murdock results
:heavy_check_mark: PASSED
53f6b9a56b98863984484b4b7845107337c36813 sys/fmt: optimize scn_u32_hex()
| Success | Failures | Total | Runtime |
|---|---|---|---|
| 10009 | 0 | 10009 | 06m:41s |
Artifacts
godlblot cortex m0+ including a version 3
version 3 might be an improved version of this but I am not sure it covers all issues of this PR
scn_u32_hex("9:3k",4) == 9 scn_u32_hex2("9:3k",4) == 39476 scn_u32_hex3("9:3k",4) == 2467
No it does not
cortex m0+ this pr pushes 4 regs to stack ( not counting strnlen) versions 3 pushes 3 current RIOT pushes 2
cortex m4 this pr 2regs to stack ( not counting strnlen) version 3 none current none
one might want to look at #19027
As I stated in PR https://github.com/RIOT-OS/RIOT/pull/19027#issuecomment-1343106941, the unittest for the fmt module are lacking. Your changes break the API promised in fmt.h. Sadly, our unittest don't catch that.
@Teufelchen1 : You probably already got something in mind would you want to make that a PR? In case, please link this and the other (my) PR for reference
Hey hey, the new test just got merged. It will fail with the current implementation in this PR. Do you want to adapt or close this?
uses more stack
How?
uses more stack
How?
seem like it needs an intermediate for the case-conversion
https://godbolt.org/z/dz3c6d1oh
for your experiments
it has the old experiments (frome the previos comment) on top
ah and the test ist still incomplete
https://godbolt.org/z/KTYWf4v66
execute on x86 godlbolt reveals this PR will return 2467 if asked for scn_u32_hex2("9:3k",5)
i am not sure how it does that
now i know : to ? are right behind 0-9 so they are alternative symbols for counting to f and the| 32 doesn't take a hit on them
0123456789:;<=>?
Yea I guess it's mostly obfuscation at this point…
It's now even worse (more stack and more assembly, and the code is barely readable)
https://godbolt.org/z/zMWGKWefT
Do you really want to try to optimize this function? (I might be biased because i wrote the current implementation https://github.com/RIOT-OS/RIOT/pull/19027 the pr also has a link to godbolt where you can find versions of this functions that are smaller on stack and in code but are also less safe ( one of them is probably also what this pr once was) )
If you do please provide some kind of proof for: "this is better"
I like godbolt gcc-arm a current version -O(2,z or 3) -mcpu=cortex-m0plus -mfloat-abi=soft or -O(2,z,3) -mcpu=cortex-m4 for reading
with a main function and an x86-64 you can also have it run something print or compare (know good to new) and print test cases in arrays iterate across ranges ... it is realy nice
I am with @kfessel here. Closing this PR seems to be the best way to handle this? Even if we find an implementation that produces perfect out put for a given compiler: There are so many variables. So far we only looked at arm but RIOT supports more ISAs. How do their compilations compare? What is about run-time performance? etc. etc.
In consent with benpicco: Closed, not worth the extra time.