protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Ruby: Discoverable object references

Open casperisfine opened this issue 3 years ago • 4 comments

I'm trying to figure out why Protobuf is causing such a memory overhead in our application. I'm using ObjectSpace.dump_all to inspect the heap, but unfortunately since protobuf objects reference each other inside C structs, the heap dump don't include the reference.

So from the point of view of heap analiser tools, these Message, Arena, etc objects are simply dangling, and I have no way to figure out what keeps a reference on them.

By using Ruby instance variables to keep these references it makes it much more discoverable.

casperisfine avatar Jul 20 '22 12:07 casperisfine

I can see the motivation. My only concerns are:

  1. Will using instance variables create more CPU or memory overhead? Will it cause a new table to be created per instance that wouldn't have been created otherwise?
  2. We should either hide instance variables from Ruby with @ or put internal_ in the name, to make it clear that these are not part of the API contract and they can change at will.

haberman avatar Jul 20 '22 15:07 haberman

Will using instance variables create more CPU or memory overhead? Will it cause a new table to be created per instance that wouldn't have been created otherwise?

In term of memory adding a first IVar on a DATA type cost 16B, a second one 8B. Since we save 8B from removing the VALUE field in the struct, for one Ivar the overhead is only 8B.

>> arena = Google::Protobuf::Internal::Arena.new
=> #<Google::Protobuf::Internal::Arena:0x000000010c0c4768>
>> ObjectSpace.memsize_of(arena)
=> 40
>> arena.instance_variable_set(:@foo, 1)
=> 1
>> ObjectSpace.memsize_of(arena)
=> 56
>> arena.instance_variable_set(:@bar, 2)
=> 2
>> ObjectSpace.memsize_of(arena)
=> 64

As for CPU, I haven't measured, I must admit I'm awfully unfamiliar with how Protobuf is being used so I'm not too sure how to craft a benchmark, but I'll try to do that.

Also if we want to arbitrate in favor of CPU usage over memory, we can replace the VALUE in the backing struct by the native upb_ pointer, as to save the unwrapping on every access. But again I'd need a benchmark for that. Do you know if there is an existing one?

We should either hide instance variables from Ruby with @ or put internal_ in the name, to make it clear that these are not part of the API contract and they can change at will.

Good point.

casperisfine avatar Jul 21 '22 06:07 casperisfine

As for CPU, I haven't measured, I must admit I'm awfully unfamiliar with how Protobuf is being used so I'm not too sure how to craft a benchmark, but I'll try to do that.

One of our most performance-sensitive patterns is something like:

msg = FooMessage.decode(large_payload)

# Access all message fields.
msg.str_field
msg.int_field
visit(msg.submsg_field)

We have some internal benchmarks. I'm getting in touch with the owners of those to see if we can test the change with those benchmarks.

haberman avatar Aug 10 '22 00:08 haberman

Thank you!

casperisfine avatar Aug 10 '22 08:08 casperisfine

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please add a comment.

This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Jan 14 '24 10:01 github-actions[bot]

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it.

This PR was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

github-actions[bot] avatar Jan 28 '24 10:01 github-actions[bot]