RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

sys/fmt: optimize scn_u32_hex()

Open benpicco opened this issue 2 years ago • 5 comments

Contribution description

We can save a branch by always converting to lowercase first.

Testing procedure

tests-fmt in tests/unittests covers this.

Issues/PRs references

benpicco avatar Aug 22 '23 09:08 benpicco

Murdock results

:heavy_check_mark: PASSED

53f6b9a56b98863984484b4b7845107337c36813 sys/fmt: optimize scn_u32_hex()

Success Failures Total Runtime
10009 0 10009 06m:41s

Artifacts

riot-ci avatar Aug 22 '23 09:08 riot-ci

godlblot cortex m0+ including a version 3

version 3 might be an improved version of this but I am not sure it covers all issues of this PR

scn_u32_hex("9:3k",4) == 9 scn_u32_hex2("9:3k",4) == 39476 scn_u32_hex3("9:3k",4) == 2467

No it does not

cortex m0+ this pr pushes 4 regs to stack ( not counting strnlen) versions 3 pushes 3 current RIOT pushes 2

cortex m4 this pr 2regs to stack ( not counting strnlen) version 3 none current none

godbolt excutable

kfessel avatar Aug 22 '23 12:08 kfessel

one might want to look at #19027

kfessel avatar Aug 22 '23 12:08 kfessel

As I stated in PR https://github.com/RIOT-OS/RIOT/pull/19027#issuecomment-1343106941, the unittest for the fmt module are lacking. Your changes break the API promised in fmt.h. Sadly, our unittest don't catch that.

Teufelchen1 avatar Oct 16 '23 09:10 Teufelchen1

@Teufelchen1 : You probably already got something in mind would you want to make that a PR? In case, please link this and the other (my) PR for reference

kfessel avatar Oct 16 '23 10:10 kfessel

Hey hey, the new test just got merged. It will fail with the current implementation in this PR. Do you want to adapt or close this?

Teufelchen1 avatar Mar 12 '24 12:03 Teufelchen1

uses more stack

How?

benpicco avatar Mar 12 '24 16:03 benpicco

uses more stack

How?

seem like it needs an intermediate for the case-conversion

kfessel avatar Mar 12 '24 16:03 kfessel

https://godbolt.org/z/dz3c6d1oh

for your experiments

it has the old experiments (frome the previos comment) on top

kfessel avatar Mar 12 '24 16:03 kfessel

ah and the test ist still incomplete

kfessel avatar Mar 12 '24 16:03 kfessel

https://godbolt.org/z/KTYWf4v66

execute on x86 godlbolt reveals this PR will return 2467 if asked for scn_u32_hex2("9:3k",5)

i am not sure how it does that

now i know : to ? are right behind 0-9 so they are alternative symbols for counting to f and the| 32 doesn't take a hit on them

0123456789:;<=>?

kfessel avatar Mar 12 '24 16:03 kfessel

Yea I guess it's mostly obfuscation at this point…

benpicco avatar Mar 12 '24 22:03 benpicco

It's now even worse (more stack and more assembly, and the code is barely readable)

https://godbolt.org/z/zMWGKWefT

Do you really want to try to optimize this function? (I might be biased because i wrote the current implementation https://github.com/RIOT-OS/RIOT/pull/19027 the pr also has a link to godbolt where you can find versions of this functions that are smaller on stack and in code but are also less safe ( one of them is probably also what this pr once was) )

If you do please provide some kind of proof for: "this is better"

I like godbolt gcc-arm a current version -O(2,z or 3) -mcpu=cortex-m0plus -mfloat-abi=soft or -O(2,z,3) -mcpu=cortex-m4 for reading with a main function and an x86-64 you can also have it run something print or compare (know good to new) and print test cases in arrays iterate across ranges ... it is realy nice

kfessel avatar Mar 13 '24 11:03 kfessel

I am with @kfessel here. Closing this PR seems to be the best way to handle this? Even if we find an implementation that produces perfect out put for a given compiler: There are so many variables. So far we only looked at arm but RIOT supports more ISAs. How do their compilations compare? What is about run-time performance? etc. etc.

Teufelchen1 avatar Mar 18 '24 11:03 Teufelchen1

In consent with benpicco: Closed, not worth the extra time.

Teufelchen1 avatar Mar 26 '24 12:03 Teufelchen1