rabbitmq-dotnet-client icon indicating copy to clipboard operation
rabbitmq-dotnet-client copied to clipboard

Do not create string objects from consumerTag, exchange and routingKey, or get them from a string cache

Open zgabi opened this issue 2 years ago • 34 comments

Proposed Changes

See issue #1231

Types of Changes

What types of changes does your code introduce to this project?

  • [ ] Bug fix (non-breaking change which fixes issue #NNNN)
  • [ ] New feature (non-breaking change which adds functionality)
  • [X] Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • [ ] Documentation improvements (corrections, new content, etc)
  • [ ] Cosmetic change (whitespace, formatting, etc)

Checklist

  • [X] I have read the CONTRIBUTING.md document
  • [X] I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • [ ] All tests pass locally with my changes
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added necessary documentation (if appropriate)
  • [ ] Any dependent changes have been merged and published in related repositories

zgabi avatar Jun 25 '22 16:06 zgabi

I'd prefer to split off at least the confirmTag into it's own PR. (as it contains a lot of special code to handle things, simpler to review track errors)

The other two are so similar that they can remain in this one.

As mentioned in the issue, I think we should look into also providing new API that take ROM for the exchangeName/RoutingKey

bollhals avatar Jun 27 '22 09:06 bollhals

@bollhals This is just a draft PR, which was created for the request of @bollhals Of cource once your team decides how to implement it, (if it helps to you) I could make another (separated) PRs. The comsumerTag and other chages are already separated to 2 commits.

zgabi avatar Jun 27 '22 11:06 zgabi

I do not have a strong opinion on whether this should be two separate PRs. However,

As mentioned in the issue, I think we should look into also providing new API that take ROM for the exchangeName/RoutingKey

would be nice and can be included into this PR or a separate one.

michaelklishin avatar Aug 01 '22 09:08 michaelklishin

I improved similarly to the .NET stream client here, avoiding tons of allocations.

We need to test the PR deeply, but it makes sense to me.

The problem here is the breaking changes.

Gsantomaggio avatar May 18 '23 13:05 Gsantomaggio

@Gsantomaggio We are using this PR in our application since almost a year ago without any problem... Today I only updated it to the latest code. (nothing else changed in this PR)

zgabi avatar May 18 '23 14:05 zgabi

@zgabi thank you! We appreciate you keeping this PR up-to-date. As you can see, we're making progress on version 7.0.

lukebakken avatar May 18 '23 15:05 lukebakken

@lukebakken Could you please run the build again, there was a problem in the APIApproval file

zgabi avatar May 18 '23 16:05 zgabi

Well, it's going to take some time to resolve conflicts with the current state of main.

@zgabi - if you have time to start resolving conflicts, I'd appreciate it.

lukebakken avatar May 09 '24 22:05 lukebakken

@zgabi @stebet I'm assuming that this code defeats the entire purpose of this PR:

https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1233/files#diff-1d0ab4919eef61c6b1cc4efcc8849c4fc04b84cf71adcf4f757855de8ec2bfa3R120-R123

But, I'm wondering what we can do to still support this information in tracing?

@zgabi PS please see this comment

lukebakken avatar May 10 '24 16:05 lukebakken

You can write the following:

        string routingKey = Encoding.UTF8.GetString(deliverEventArgs.RoutingKey.Span);
        string exchange = Encoding.UTF8.GetString(deliverEventArgs.Exchange.Span);

So only string allocation is needed, no array.

And put the exchange line to line 136, so create only when needed in the if statement.

I'm not very familiar with the latest code.. is this Deliver executed every time when you deliver a message or only in some special case, e.g. when some tracing is enabled?

edit: added a new commit with this change

thanks for updating the code... you were faster :) .. sorry, I had to work today

zgabi avatar May 10 '24 17:05 zgabi

Thank you @zgabi!!!!

lukebakken avatar May 11 '24 15:05 lukebakken

@zgabi thanks for all of your work on this. I'm planning to merge it by the end of the day tomorrow (16 May) unless there are further changes needed.

lukebakken avatar May 15 '24 17:05 lukebakken

Please see this comment: https://github.com/rabbitmq/rabbitmq-dotnet-client/issues/1231#issuecomment-2115566382

lukebakken avatar May 16 '24 15:05 lukebakken

@zgabi @bollhals @michaelklishin take a look at this commit:

https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1233/commits/d4e88ca738219fbcdac09324e10b8ead38966c73

I would like to replace the usage of ReadOnlyMemory<byte> and string, when used for exchange and queue names, and routing keys, with the dedicated types in that commit. Thoughts?

lukebakken avatar May 16 '24 22:05 lukebakken

Hi,

So the parameters of the of the HandleBasicDeliver will be alsoe ExchangeKey and RoutingKey? For me it is ok.. I had a different branch/commit for this change (However they are using CachedStrings, but not so hard to separate them to the 2 new types): https://github.com/zgabi/rabbitmq-dotnet-client/commits/routingkeyandexchange

This branch was an alternative to the "stringallocations" branch.

zgabi avatar May 17 '24 07:05 zgabi

@zgabi I realize this PR now contains an enormous number of changes, but the vast majority of them are the following:

  • Replace string with RoutingKey, QueueName, ExchangeName or ConsumerTag, as appropriate.
  • Update APIs with the above

Here are the "big" changes -

  • Consumer dispatchers now pass RentedMemory for the method and body part of a delivery, which prevents string allocations unless the handler explicitly triggers them in the AmqpString instance - https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1233/files#diff-74d6657893b833f7aefa9d2a615ae0d7dc1c4cfefa6615461c1b1638ff4b17abR53-R59
  • This is one such place where casting an instance of an AmqpString subclass will result in a string allocation.

@zgabi if you wouldn't mind re-running your allocation tests that would be great. Or, let me know how you did them and I will run them. Thanks!

lukebakken avatar May 20 '24 17:05 lukebakken

While this change and the effort from you @lukebakken are honerable, I think it is currently way too big to properly analyze and judge the impact of the change. And also it contains many unrelated changes, making it even harder to judge.

My proposal would be to split off any unrelated changes, to pass them individually.

Regarding the main topic of the PR:

  • With the introduction of these types, it's much clearer what you're suppose to pass as a parameter. But this comes at a tradeoff as well:
    • You always have to go through an indirection (the type) if you want / need it or not
    • You need to allocate this intermediate type, which slows you down if you do not plan on reusing it.

As it is right now, I think it is further away from what it tried to acheive than where the main is right now ... :/

IMHO, having classes wrapping string / bytes is only useful if

  • You should be reusing them / they're not supposed to be allocated just for this one call
  • The library has a cache to avoid the allocation of it (e.g. Avoid allocating the exchange string + ExchangeName object for the delivery of messages, as they're often the same over and over again.)

bollhals avatar May 21 '24 20:05 bollhals

Hi @bollhals thanks for your input.

The library has a cache to avoid the allocation of it (e.g. Avoid allocating the exchange string + ExchangeName object for the delivery of messages, as they're often the same over and over again.)

Well, that's exactly what this PR accomplishes with the AmqpString/ExchangeName types, so there's that at least 😄.

There is no cache, but upon delivery, ExchangeName only points to the backing ReadOnlyMemory<byte> of the incoming data. A string is only initialized from it if the user takes an action to do so. Once the delivery handler exits, the memory is released.

~~I will see about moving other changes to a different PR.~~

Actually, nearly all of the changes in this PR are due to the migration from string to dedicated types for exchange name etc, so I don't see the value in a separate PR.

lukebakken avatar May 21 '24 20:05 lukebakken

IMHO, having classes wrapping string / bytes is only useful if

@bollhals there is another benefit - I have had RabbitMQ support cases that boil down to the wrong string passed as the wrong parameter to a library call. Having dedicated types makes that kind of mistake much less likely.

lukebakken avatar May 21 '24 20:05 lukebakken

IMHO, having classes wrapping string / bytes is only useful if

@bollhals there is another benefit - I have had RabbitMQ support cases that boil down to the wrong string passed as the wrong parameter to a library call. Having dedicated types makes that kind of mistake much less likely.

Sure, that's what I ment by stating to make it much clearer what to pass. But the tradeoff is the allocation of the objects and the indirection through another object.

It depends in the end what the goal is:

  • Increasing performance? => I have a feeling that these additional objects & indirection made it slower for most real world usecases. (Perf tests needed to confirm)
  • Increasing usability? => Fine, this surely helps, but then the PR Title should be changed, as the goal of not allocating string objects becomes a side thing.

bollhals avatar May 21 '24 21:05 bollhals

I'll run some performance tests. I don't think I have access to memory profiling tools to see how this PR affects allocations ... @zgabi ?

lukebakken avatar May 21 '24 21:05 lukebakken

@zgabi @bollhals I have force-reset the stringallocations branch to 2320480c286d460d71d997661413594117c01c29, which is just before I went crazy with this PR. I saved my work on another branch.

lukebakken avatar May 22 '24 00:05 lukebakken

  • Increasing performance? => I have a feeling that these additional objects & indirection made it slower for most real world usecases. (Perf tests needed to confirm)

Sometimes, minimal local extra work/complexity avoids extra processing, allocations, GC work. It really needs to be measured.

paulomorgado avatar May 22 '24 13:05 paulomorgado

Here is what I see when comparing allocations between the stringallocations branch and the current main (as of 1600PDT 2024-05-22), using the MassPublish program to send 64 batches of 1024 messages per batch.

Running from the command line, each branch appears to take about the same amount of time. Memory usage appears to be about the same.

@zgabi @bollhals @paulomorgado thoughts? Right now I don't really see a reason to include this PR in version 7 and would happily move it to version 8 (where taking allocations into account will be a first-class priority).

stringallocations branch

image

main branch

image

lukebakken avatar May 22 '24 23:05 lukebakken

@lukebakken, I haven't looked at the stringallocations branch in detail, but I have some generic considerations.

If the stringallocations branch has 98,479 strings and the main branch has 265,205 strings, where did those strings go? What are those strings? Can those strings be pooled/cached?

One of the benefits of System.Text.Json is working with UTF-8 instead of UTF-16 (the encoding of System.String. System.Char is 2 bytes long). And to truly benefit from that, there are APIs working on byte alongside char.

I find very worrying the existence of System.ArrayEnumerator instances in the heap. This is an indication that they are being boxed. Coupling this with System.Linq.Enumerable (a known performance killer), I'd say LONQ to Objects is being used and shouldn't.

What is System.Collections.Generic.LinkedList<System.UInt64> being used for?

paulomorgado avatar May 23 '24 08:05 paulomorgado

Hi @paulomorgado

If the stringallocations branch has 98,479 strings and the main branch has 265,205 strings, where did those strings go? What are those strings? Can those strings be pooled/cached?

That's the point of this PR, to reduce string allocations when this library handles basic.deliver from the AMQP server. Exchange name and routing key used to be converted to strings, and this PR keeps them as ReadOnlyMemory<byte>

I find very worrying the existence of System.ArrayEnumerator instances in the heap. This is an indication that they are being boxed. Coupling this with System.Linq.Enumerable (a known performance killer), I'd say LONQ to Objects is being used and shouldn't.

That's easy to fix. Visual Studio "helpfully" suggested adding a .Cast<> in a couple foreach loops. I will open a new PR.

What is System.Collections.Generic.LinkedList<System.UInt64> being used for?

C:\Users\lbakken\development\rabbitmq\rabbitmq-dotnet-client_main [main ≡]
> git grep LinkedList
projects/RabbitMQ.Client/client/impl/ChannelBase.cs:63:        private readonly LinkedList<ulong> _pendingDeliveryTags = new LinkedList<ulong>();

lukebakken avatar May 23 '24 18:05 lukebakken

Hi @lukebakken

If the stringallocations branch has 98,479 strings and the main branch has 265,205 strings, where did those strings go? What are those strings? Can those strings be pooled/cached?

That's the point of this PR, to reduce string allocations when this library handles basic.deliver from the AMQP server. Exchange name and routing key used to be converted to strings, and this PR keeps them as ReadOnlyMemory<byte>

As you pointed out, the only change is the reduced number of string instances and not overall memory or CPU.

So, what's the benefit here?

Is it the tests that are not taking advantage of this?

C:\Users\lbakken\development\rabbitmq\rabbitmq-dotnet-client_main [main ≡]
> git grep LinkedList
projects/RabbitMQ.Client/client/impl/ChannelBase.cs:63:        private readonly LinkedList<ulong> _pendingDeliveryTags = new LinkedList<ulong>();

So, it's kind of a queue, but you need to remove items from both ends and, sometimes, anywhere. Is that it?

paulomorgado avatar May 24 '24 07:05 paulomorgado

@zgabi - if you have time, please read @paulomorgado's observations:

As you pointed out, the only change is the reduced number of string instances and not overall memory or CPU. So, what's the benefit here? Is it the tests that are not taking advantage of this?

I am running the MassPublish program using the ".NET Object Allocation Tracking" performance monitor, and, as @paulomorgado notes, there does not seem to be a reduction in memory use.

At this point I'm getting the feeling like I'm testing this incorrectly, perhaps?

lukebakken avatar May 25 '24 17:05 lukebakken

Reducing the allocations reduces the GC overhead only. You wrote that the running time and memory is almost the same... but is the CPU usage also the same? I think it should be bigger (because of multiple threads) when there are a lot of allocations.

zgabi avatar May 28 '24 21:05 zgabi

I'm assuming the same test conditions, otherwise the tests are not comparable.

paulomorgado avatar May 29 '24 08:05 paulomorgado