p4c
p4c copied to clipboard
Reduce proliferation of cstrings
cstrings are commonly used as "the string type" in p4c. However, they should be used with more care. The reason is that they are internalized, so each new string is stored forever in some global map.
This makes them great for various containers, both keys and values, for IR strings, etc. However, in many cases cstring
s are used just for random transient strings. As a result:
- Copies are created every time
cstring
is constructed - These copies are stuck forever
As a result, we might observe some elevated memory usage especially for long-lasting processes. I definitely saw that ~1% of runtime of benchmark from https://github.com/p4lang/p4c/pull/4475 is spent on string hash used for cstring
internalization. I have not measured how many strings are in cstring
hash, but I'd assume a lot.
Instead, one should resort to std::string
and std::string_view
for transient values that are not required to be saved forever. Also, we might consider adding some kind of StringMap
with string keys that would own keys and not pollute the common cstring cache.
Tagging @fruffy
Instead, one should resort to std::string and std::string_view for transient values that are not required to be saved forever.
I agree there 100 %. We might need to provide some utility functions to make the migration easier. Some of the uses might be there just because some things are easier with cstring then with std::string (or because we relatively recently switched do C++17).
Also, we might consider adding some kind of StringMap with string keys that would own keys and not pollute the common cstring cache.
Could you please give a use-case for that?
Could you please give a use-case for that?
Just loud thinking – if one would need to have internalized string keys (e.g. for faster lookup), but definitely would worth starting from the proper usecase in the code
I agree, I sometimes wonder whether the interning does more harm than good. It might make sense to rename cstring
to something like CachedString
. Simply to make it explicit to users that this string will stay around. Or use an explicit StringMap, as you suggest.
In the case of P4Testgen, cstrings can work well, because the interpreter ends up allocating and reusing the same kinds of strings a lot. We fixed the majority of cstring leaks there and performed benchmarks running the interpreter for days without running out of memory, but that does not mean there aren't any cstring leaks left.
I collected some statistics on downstream project. And the results are overwhelming. Essentially cstring cache works as sink, collecting everyone and their brother: all format strings, pieces of input, json values, ...
Essentially after frontend the cstring cache contained 300k strings (!), midend raised this up to 311k (though we do not run much midend passes). Number of map lookups was great as well totalling 86M after frontend (each cstring ctor involves at least 1 map lookup, or 2 if the value should be stored).
So, lots of hash table lookups, lots of string hashes, etc.
I will try to see if there is some low-hanging fruit here... Likely cstring should be "storage-only" type for node members. But not a replacement for generic string type...
I will try to see if there is some low-hanging fruit here..
I would guess the newName()
method of the ReferenceMap has a lot of weight there. There are a lot of temporary names generated with it.
I will try to see if there is some low-hanging fruit here..
I would guess the
newName()
method of the ReferenceMap has a lot of weight there. There are a lot of temporary names generated with it.
I would expect these to mostly end in the IR nodes, which is OK-ish. The problem is that cstring
is convenient, in some ways more convenient than std::string
so programmers just tend to use it as there is no big hurtle in using it that could prompt them to ask if it is a good idea :-). Error-handling-related code definitely should not use cstring
.
One interesting thing could be to (temporarily?) disable[^1] the operator+
/operator+=
for cstring
... this is almost never a good idea to use as there is a high chance that an intermediate value will get cached.
Similarly with to_cstring
, join
... maybe we could replace them with freestanding functions on string for the stuff that is not already covered over std::string
. Stuff like substr
, replace
, etc. should return std::string
(or std::string_view
in cases like substr
). I'm not sure how much would this break for backends and that will be always hard to know when we don't even know all the 3rd party backends that are based on P4C...
[^1]: Or we could start by making a lot of these functions [[deprecated]]
and, clean up our code (possibly with some targetted -Werror
in CI), and let the backends decide for themselves. But this cannot be done in cases of changing types or conversions of course.
One interesting thing could be to (temporarily?) disable1 the
operator+
/operator+=
forcstring
... this is almost never a good idea to use as there is a high chance that an intermediate value will get cached.
Yes, exactly. I already started to look into various cstring-related things. Good news is that currently it is almost always done via std::string
. At least in toString
methods of various nodes. There are few notable cases, where I fixed this.
There are extreme cases in backends. bpf / ebpf here are the most notable examples where cstring
's are created for almost everything for no particular reasons :) Overall I'd expect that this would be the typical case for downstream backends as far as I can see from few (limited) cases ;)
#4694 is a painful commit, it breaks a lot of back ends. But it looks like it is necessary. It shows that there are a lot of unneeded cstring instantiations. For example, why do we need to cache all the options here: https://github.com/p4lang/p4c/blob/main/frontends/common/parser_options.h#L60
What is the cost of all these "unnecessary" cstrings? Yes an extra copy occurs when creating one, but that would happen with a std::string
too. Yes, they are (currently) not garbage collected, but the total size of all cstrings is trivial compared to the total heap size -- generally less than 0.01% I'd be concerned with doing a lot of work that would have no (or even negative) overall effect.
Yes, they are (currently) not garbage collected, but the total size of all cstrings is trivial compared to the total heap size -- generally less than 0.01% I'd be concerned with doing a lot of work that would have no (or even negative) overall effect.
It is not the total size that matters. Every cstring construction involves map lookup. Sometimes two lookups if the string is new and should be copied. Each lookup is hash calculation plus string equality check. The "cstring proliferation problem" is not that visible inside frontend / midend. Though we store bunch of stuff there: we do store the whole input several times (whole input lines, all parsed tokens including all comments and finally parsed identifiers). So, essentially we do use cstrings for something transient, we do use cstrings for string literals a lot in many places, where just string_view
will be enough, etc., etc.
I do see cstring::save_to_cache
in the profiles and sometimes it is more than just 1%. Also, the less we store, less the GC overhead. As https://github.com/p4lang/p4c/pull/4694#issuecomment-2141066014 shows, just allocating cstring cache in non GC-ed arena yields 3% improvement due to less GC work here. Overall, all these strings is something for GC to scan for pointers and liveness.
The real problem is backends. Backends tend to use cstring
as "the P4C string type". And this is just plain wrong. Besides the statistics presented in https://github.com/p4lang/p4c/issues/4481#issuecomment-1985003987 (300k cstrings created in frontend with 86M cstring cache map lookups there), the downstream backend increased the number of strings saved to 2M.
So, the intention is to ensure that it would be hard to create cstring
implicitly. Also, these implicit conversions make string_view
argument overloads impossible due to ambiguity. And we already obtained the speedup :)