protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

High cost encoding and decoding in the League library

Open mgrojo opened this issue 2 years ago • 5 comments

https://github.com/reznikmm/protobuf/blob/a1422f35ca12ad07c7611b4aa2cd86fd01fa8f91/source/runtime/pb_support-io.adb#L746

Similar to #20.

I was going to apply the same solution than in #20. I see two possibilities:

  • Use another holder instance in PB_Support.IO.
  • Share the holder instance moving it from PB_Support.Internal to PB_Support (private part) and use it from both packages.

Do you see any problem in sharing this in PB_Support?

mgrojo avatar Jan 24 '23 10:01 mgrojo

Well, I see PB_Support is pure, so maybe it's not a good idea to change that to include the variable. I'll go for the other option, unless you have other proposal, @reznikmm.

mgrojo avatar Jan 24 '23 10:01 mgrojo

I propose to move

Codec : Text_Codec_Holders.Holder;

into a public part of PB_Support.Internal and reuse it in the body of PB_Support.IO if this works.

reznikmm avatar Jan 24 '23 10:01 reznikmm

It doesn't make much difference after the change, this time. Looking more closely at the report, most of the time is gone encoding and decoding the UTF-8 strings.

mgrojo avatar Jan 24 '23 15:01 mgrojo

What I've gathered is that League.Universal_String is using UTF-16 and we need to convert to UTF-8 forth and back as required by Protobuf.

I was looking at https://web.archive.org/web/20220817170400/https://forge.ada-ru.org/matreshka/wiki/League/Performance which says

  • use of special algorithms and utilization of SIMD operations (when available) significantly improve performance.

and I wonder how could I check that it is actually using the SIMD operations in my build. I don't see any clue about it in the Callgrind report.

I was also wondering if there could be a way to speed up conversion if you know that the input is always going to be in the US-ASCII subset, which is my particular case.

mgrojo avatar Jan 26 '23 09:01 mgrojo

We could try to replace Matreshka with VSS. VSS uses utf-8 inside and keeps short strings inline...

reznikmm avatar Jan 27 '23 14:01 reznikmm