rabbitmq-dotnet-client
rabbitmq-dotnet-client copied to clipboard
Do not create string objects from consumerTag, exchange and routingKey, or get them from a string cache.
In our real application the most allocated objects are Strings. Most of them are allocated in RabbitMQ Client:
They usually contains the same value.
Please consider using ReadOnlySpan<char>
or ReadOnlySpan<byte>
with a pooled array in the background similar to the data.
Alternatively get the strings from a cache table (maybe separate table for each "string type" (consumerTag, exchange...)), since most of them are the same.
By the way in the screenshot the 3rd, 4th, 5th and 6th allocations also belongs to RabbitMQ Client, but maybe some of them are already solved in the main branch. (I'm using the latest 6.x package) The 2nd one is WFP, I have a PR for that.
How much difference would this make to even a very basic (tutorial 1-2 style) consumer?
I don't undertand your question. We have 1 consumer in this process.
Feel free to submit a pull request implementing your suggestion.
Which one do you prefer?
- Receive
ReadOnlySpan<char>
in theIBasicConsumer
's Handle* methods (e.gHandleBasicDeliver
), so it is already UTF-8 decoded characters - Receive
ReadOnlySpan<byte>
in theIBasicConsumer
's Handle* methods (e.gHandleBasicDeliver
), the raw bytes - Do not change the interface, but cache the strings in WireFormatting.ReadShortstr.. maybe this one is slower, because the cached strings should be compared with the received data. But do not require to change the interface.
In the framework? https://github.com/dotnet/runtime/issues/28368
No for some reason I thought string interning worked differently.
I'd like to get feedback from the "usual gang", aka @bollhals, @bording, @danielmarbach and @stebet if they have time. I'm a bit surprised nobody has reported this issue in their use of this library.
@zgabi my point is that caching is always harder than it sounds. What would be the gains (in terms of memory footprint and allocations) to a realistic consumer (application) like yours, or even a basic tutorial-style one.
Previously all efficiency improvements in this client were driven by profiler results. Sure, you already have some but I am curious about the difference it makes.
I'd like to get feedback from the "usual gang", aka @bollhals, @bording, @danielmarbach and @stebet if they have time. I'm a bit surprised nobody has reported this issue in their use of this library.
Because they are just some small allocations. 2-3 small strings / message ("exchange" is empty string for me, which is not allocated) And everybody is happy since a lot of other allocations were removed in the past 3-4 years. (Me, too) Version 6.x is much better and faster than 4.x was 4 years ago. And I saw that the main branch has more new related improvelemts (BasicProperties is a struct, etc...)
We are sending a lot of messages (~200/ sec), because it is a requirement for us to send everything between the server and the clients through RabbitMQ... even video streams. This string allocation problem is also not a big issue for us... 1.8 M messages allocations / 38 minutes. GC can handle it :)
I reporteed this issue, since it is the top 1 in the memory profiler.
Yes, caching is harder than simply passing spans to the user.
For exmaple:
- caculating some hash from the data
- put it to a thread static dictionary (with this the string will exist only 1 time / thread, which is quite good, and does not need locking).
So probably this is slower than just simply creating a new string.
So I'd prefer passing span (or memory) of byte to the user.... similar to the data.
It needs the modification of the IBasicConsumer
, but does not have any other disadvantages. It will be even faster, since no UTF-8 decoding is needed. And no string allocation, although it is fast...
@zgabi we can do this for 7.0. Option 2 above is more in the spirit of RabbitMQ and clients but I can see how many might vote for option 1.
I agree with @michaelklishin that this should be 7.0 only if there is an API change.
In our real application the most allocated objects are Strings. Most of them are allocated in RabbitMQ Client:
Yeah the strings always were a bit of a problem, but we've focused on everything else so far.
Which one do you prefer?
- Receive
ReadOnlySpan<char>
in theIBasicConsumer
's Handle* methods (e.gHandleBasicDeliver
), so it is already UTF-8 decoded characters- Receive
ReadOnlySpan<byte>
in theIBasicConsumer
's Handle* methods (e.gHandleBasicDeliver
), the raw bytes- Do not change the interface, but cache the strings in WireFormatting.ReadShortstr.. maybe this one is slower, because the cached strings should be compared with the received data. But do not require to change the interface.
Regarding the options ROS<> in general is problematic for as long as we want to support async methods. We can think about ROM<>.
Caching, I've actually 1.5 years ago started experimenting with option 3 back then (see commit here) and even saw gains in overall performance if the string I think was > 16 chars. But it's been a while, so one would had to check again.
In principal the API is a complex topic and depends which usecase we want to support.
- Caching is the default easy to use for the customer API that the user already knows. (Though fine tuning might require some new API, e.g. for cleaning up the cache)
- ROM
is still kind of convenient to use, but often you have to have a own copy / ToArray() it, where you'd loose all gains again. (E.g. all ConsumeTag are stored in the consumer, so we'd have to adapt the API there as well and allocate here to store the information.) - ROM
on to of the above, we'd save on the encoding. But pay the price of usability. In some cases (especially where you don't use it) you have a lot of gain, but on others where you do want to access it en encode, it looks like it's more a step in the wrong direction
=> So for me the caching option looks like the best option if the API can somewhat be simple enough for the common use case.
What about distinguish the consumerTag from the another 2 strings. As you wrote consumerTags are problematic, they are stored in dictionary and hashset. Multiple handlers receve it.
It is quite easy to get rid of the another two to change it to ROM<byte>
or similar custom struct which has an overridden ToString
... so the user's can easily get the string value.
I've investigated a little bit.
This is measure from the current main branch:
(Btw: you can see that everything other disappeared from the list which was in v 6.4 in my first comment)
This measure is when I change the exchange and routingKey to ROM<byte>
:
Changes: https://github.com/zgabi/rabbitmq-dotnet-client/commit/a6ac5c47e3407b53dcee1a4645aa83df07cc07be
I tried to make similar time periods for the measure, but check the ratio between the number of the objects. In the latest main code there are almost the same number ofd strings as WPF EventKeys. In the modified code about half of them.
I could change the ROM to another struct with an overridden ToString. It won't affect the memory allocations, and also not really changes the performance. So the user can get the string... but he has the opportunity to convert to Span<char>
to without allocation if he also cares it.
I haven't looked in depth into the code base for a while... I was wondering if we have already looked at places where we can benefit from the low allocation APIs like string.Create
to give use some gains there before we go into the complexity of string caching?
@danielmarbach string.Create
is also creating a new string object :) This issue is not about string concetanation where string.Create
would be useful. There are no concatenation in this case.
yeah I'm aware of that. My point was not about the specific problem but more about making sure where we use string and concatenation we do it efficiently before we go into complex string caching.
As far as I saw there are no string concatenation issues in the code. After the connection is established 100% of the new strings in RabbitMQ client are cunsumerTags, excanges and routingKeys.
The only string concatientaion in this library is in the ToString method of the ShutdownEventArgs.. which is (according to the method comment's) only for debugging: https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/6b1a06bd429f395891a3230cad92e674dcbbb0d2/projects/RabbitMQ.Client/client/api/ShutdownEventArgs.cs#L97-L108
Created a test commit for removing the string allocation. Since the consumers are stored in a dictionary, and the dictionary lookup is needed anyway, RabbitMQ client can store the the known clientTag strings in that dictionary. I haven't measured, but probably this one is slightly faster, because the "utf8 data" is about half than the "string data", so the hash calcualtion and comparsion should be faster. The string allocation is needed only when there is no consumer for the consumerTag. For me it removes 100% of the string allocations. I don't know whether it is a problem for anybody that the allocations are not removed when consumer does not exists. (How can that happen?)
Commit: https://github.com/zgabi/rabbitmq-dotnet-client/commit/4b67b238b627abef901e448099e4672bd8c498fd
Memory profiler result after both commits:
So basically the consulerTag remains string everywhere, but (only) in the HandleBasicDeliver it is not allocated when consumer exists)
FYI Ben Adams wrote once an intern pool https://github.com/benaadams/Ben.StringIntern.
Yet it seems by the first look your approach is elegant and low effort. Need to look closer though a bit next week
One thing you need to take care of if you store it in a dictionary or cache it, is that the backing storage is a rented array, hence it will be returned and potentially overwritten, which changes the content of your ROM
I'll try to look into the options and provide a bit more context.
Btw, we already have this CachedString class here, the intend when introducing it was cases like this.
Yes, I create a new byte[] in the AddConsumer method. No rented array is used after it is returned to the pool.
If I'm not mistaken then you return the array right after you call dispatch. Whereas for de body array we pass it on to the dispatcher.
Oh, so you mean the excahnge and routingKey.
Yes.
I've created new commits, fixed some other things which was commented by @danielmarbach (See comments here: https://github.com/zgabi/rabbitmq-dotnet-client/commit/4b67b238b627abef901e448099e4672bd8c498fd)
New commits: https://github.com/zgabi/rabbitmq-dotnet-client/commit/926b2f848576e9afc7033ca3d5e3728538458ce6 and https://github.com/zgabi/rabbitmq-dotnet-client/commit/fc636a2c8a6dfc7419dc3b7a06cd29f851a8f755
I've created new commits, fixed some other things which was commented by @danielmarbach (See comments here:
You should make your commits on a separate branch and open a draft PR. It makes it much easier to see what you're doing that way.
Ok, created a draft PR (#1233)
I've had a look at it and I think it's very important to be consistent in the API.
Meaning if we expose e.g. ExchangeName as ReadOnlyMemory
On the topic of ReadOnlyMemory
- ROM is easier to implement / maintain for the library.
- With ROM it's not obvious that it is a UTF8 string. So more work for the devs to figure this out to be usable. => For me more a question what direction the library wants to go for, that's though is not my decision. (My Opinion: I'd go for simplicity in the library, UTF8 byte array should be common enough for the devs that use it to not be a huge burden)
Yes, all the methods which has routingKey / exchange should accept ROM<byte>
,too.. even more, that should be the prefered use.
For compatibility they can have an overload which accepts string, but internally they should only convert the received parameter to byte[]. So the users should know that when they use the string parameter version, it is slower.
This makes the CachedString and some relates classes unnecessary. (For example the class called BasicPublis... always the BasicPublishMemory will be used - which should be renamed to BasicPublish)
Instead of accepting (and providing in the Handle* methods) ROM
, the parameter type could be a "CustomString" (todo: give a good name) struct, which internally similar to the ROM<byte>
, but it will be obvoius to the users that they are strings. Can for example explicit/implicit cast from/to string or call ToString().
In this case it will be enough to have only 1 parameter combinataion for for example the BasicPublish method
void BasicPublish<TProperties>(CustomString exchange, CustomString routingKey, ref TProperties basicProperties, ReadOnlyMemory<byte> body = default, bool mandatory = false)
And the users can call it with strings (implicit cast):
BasicPublish("myexchange", "myRoutingKey", ref basicProperties, body)
And the users who are interested in the performance will "cache" the CustomString stuct.
@zgabi in the protocol, the type used by those fields is called "short strings". It can be a decent type name to use in the context of this client.
The default up until now was and still is the string case, so we should "try" not to make this case worse, but rather provide a even better alternative. (The CachedString overload was just recently added and can probably easily be modified if needed)
Nesting structs in structs will have it's own challenges and inefficiencies, but if wanted it can be managed. But I'd hold off implicit conversions to this type in general, and from this type if they allocate. (E.g. No implicit converstion to string, if it means allocating a string. then I rather have an explicit ConvertToString or similar)
I created a new commit which is using CachedStrings for every exhange and routingKey (sometimes the parameter name is "name" or "source", but it contains exchange or routingKey)
This branch and commit does not contain the counsumerTag change as requested: https://github.com/zgabi/rabbitmq-dotnet-client/commit/2e71e8b5c7f23514b62aef17099593232379c84f
All public method has both string and CachedString overloads. (Except the HandleBasicDeliver, which is a callback, it makes no sense to add and call both) Internal is only CachedString (when it is enough)
Should I create a new PR?