protobuf
protobuf copied to clipboard
[Ruby] Setting Array/Hash values is inconsistent/buggy
Hi, the behavior in this issue describes has been consistent for all gem releases since 3.0.0.alpha, on all ruby versions, AFAIK. We have worked around it, but I think it can be better, especially as more and more gems are moving to being generated only instead of hand-written, and exposing the protobuf objects directly to users.
The ruby gem provides support for Ruby Array and Hash objects. It will automatically map an Array object to a protobuf RepeatedField object, and a Hash object to a protobuf Map object. Here are some examples using the TestMessage
protobuf class used in the ruby tests:
msg_by_array = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
msg_by_array.repeated_int64.class #=> Google::Protobuf::RepeatedField
msg_by_array.repeated_int64.to_a #=> [1, 2, 3, 4]
msg_by_hash = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
msg_by_hash.map_string_string.class #=> Google::Protobuf::Map
msg_by_hash.map_string_string.to_h #=> { 'hello' => 'world' }
This is great! Except you can't set a value on an instantiated protobuf object using Array or Hash, because it raises TypeError
:
msg_by_array.repeated_int64 = [1, 2, 3, 4] # TypeError
# TypeError: Expected repeated field array
msg_by_hash.map_string_string = { 'hello' => 'world' } # TypeError
# TypeError: Expected Map instance
You also can't set the value in the initializer with a protobuf object, or you get an ArgumentError
:
msg_by_rf = A::B::C::TestMessage.new(repeated_int64: msg_by_array.repeated_int64) # ArgumentError
# ArgumentError: Expected array as initializer value for repeated field 'repeated_int64'.
msg_by_map = A::B::C::TestMessage.new(map_string_string: msg_by_hash.map_string_string) # ArgumentError
# ArgumentError: Expected Hash object as initializer value for map field 'map_string_string'.
This causes us to perform many workarounds, which for the most part is manageable. But, in general this doesn't seem to be very usable, as it requires users to run into errors more often than they expect.
I've created the following (failing) test file that demonstrates these concepts. It can be dropped into ruby/test
and should run just fine.
#!/usr/bin/ruby
require 'generated_code_pb'
require 'test/unit'
class MismatchedSettersTest < Test::Unit::TestCase
def test_setting_Array_on_initializer
m = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
assert_kind_of(Google::Protobuf::RepeatedField, m.repeated_int64)
assert_equal([1, 2, 3, 4], m.repeated_int64.to_a)
end
def test_setting_RepeatedField_on_initializer
m1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
assert_kind_of(Google::Protobuf::RepeatedField, m1.repeated_int64)
assert_equal([1, 2, 3, 4], m1.repeated_int64.to_a)
# This fails with ArgumentError: Expected array as initializer value for repeated field 'repeated_int64'.
# because it can't take a Google::Protobuf::RepeatedField
m2 = A::B::C::TestMessage.new(repeated_int64: m1.repeated_int64)
assert_kind_of(Google::Protobuf::RepeatedField, m2.repeated_int64)
assert_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
end
def test_setting_Array_on_accessor
m = A::B::C::TestMessage.new
assert_kind_of(Google::Protobuf::RepeatedField, m.repeated_int64)
assert_not_equal([1, 2, 3, 4], m.repeated_int64.to_a)
# This fails with TypeError: Expected repeated field array
# because it expects the value to be Google::Protobuf::RepeatedField
m.repeated_int64 = [1, 2, 3, 4]
assert_equal([1, 2, 3, 4], m.repeated_int64.to_a)
end
def test_setting_RepeatedField_on_accessor
m1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
assert_kind_of(Google::Protobuf::RepeatedField, m1.repeated_int64)
assert_equal([1, 2, 3, 4], m1.repeated_int64.to_a)
m2 = A::B::C::TestMessage.new
assert_kind_of(Google::Protobuf::RepeatedField, m2.repeated_int64)
assert_not_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
m2.repeated_int64 = m1.repeated_int64
assert_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
end
def test_setting_Hash_on_initializer
m = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
assert_kind_of(Google::Protobuf::Map, m.map_string_string)
assert_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
end
def test_setting_Map_on_initializer
m1 = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
assert_kind_of(Google::Protobuf::Map, m1.map_string_string)
assert_equal({ 'hello' => 'world' }, m1.map_string_string.to_h)
# This fails with ArgumentError: Expected Hash object as initializer value for map field 'map_string_string'.
# because it can't take a Google::Protobuf::Map
m2 = A::B::C::TestMessage.new(map_string_string: m1.map_string_string)
assert_kind_of(Google::Protobuf::Map, m2.map_string_string)
assert_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
end
def test_setting_Hash_on_accessor
m = A::B::C::TestMessage.new
assert_kind_of(Google::Protobuf::Map, m.map_string_string)
assert_not_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
# This fails with TypeError: Expected Map instance
# because it expects the value to be Google::Protobuf::Map
m.map_string_string = { 'hello' => 'world' }
assert_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
end
def test_setting_Map_on_accessor
m1 = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
assert_kind_of(Google::Protobuf::Map, m1.map_string_string)
assert_equal({ 'hello' => 'world' }, m1.map_string_string.to_h)
m2 = A::B::C::TestMessage.new
assert_kind_of(Google::Protobuf::Map, m2.map_string_string)
assert_not_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
m2.map_string_string = m1.map_string_string
assert_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
end
end
What is this issue looking for? Setting repeated/map fields with ruby array/map?
Or constructor with protobuf object?
Being able to assign an Array or Hash to an attribute, and assign a RepeatedField or Map in an initializer. Basically, make the failing tests pass.
In your test, I also see you check the result of to_a
and to_h
.
Do you also need these features?
@haberman do we want to support assign an Array or Hash to an attribute, and assign a RepeatedField or Map in an initializer? And do we guarantee the result of to_a
and to_h
?
This behavior is the result of some deliberate thinking/design. Basically we try to give predictable behavior, and avoid surprises. And we try to behave like Struct
generally.
The tricky issue is that, to do proper type checking, we can't just take a reference to the existing array. Consider:
arr = [1, 2, 3, 4]
msg_by_array.repeated_int64 = arr
# Whoops, now our repeated int64 field has a string in it.
arr << "Broccoli"
Instead we need a separate array-like object, RepeatedField
, that can enforce that all of the members have the correct type.
It would be possible to make field assignment on a protobuf do the equivalent of:
# msg_by_array.repeated_int64 = [1, 2, 3, 4]
msg_by_array.repeated_int64.replace([1, 2, 3, 4])
But this is un-Struct like, and would cause surprises if someone wrote:
arr = [1, 2, 3, 4]
msg_by_array.repeated_int64 = arr
# Whoops, won't affect msg_by_array.repeated_int64
# even though with Struct it would.
arr << 5
To maintain consistency here, we decided not to allow direct assignment. Both of the following work as alternatives:
# Append
msg_by_array.repeated_int64 += [1, 2, 3, 4]
# Replace/assign.
msg_by_array.repeated_int64.replace([1, 2, 3, 4])
You could argue that we've already broken this model by allowing the following:
msg_by_array = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
But I see that as more akin to how Array#new
accepts an array parameter which it copies into the new array.
All that said, it's possible our efforts at consistency here are annoying people more than they are saving people from frustrating surprises. We just want to be cautious here, because "copy on assignment" is different than how almost everything else in Ruby works. You would never expect this to be doing any copying:
Struct.new("Foo", :a)
foo = Struct::Foo.new
# This will never copy a.
foo.a = a
I guess what I'm saying here is that if we can accept an Array in an initializer method, with all the appropriate type checking, then should be able to do the same on an attribute setter method. Similarly, if we can accept a ReleatedField on an attribute setter method, then should be able to accept it in an initializer method. Why is the behavior of initializer methods and attribute setter methods different?
We have seen developers both outside and inside Google bump into this.
Why is the behavior of initializer methods and attribute setter methods different?
Because Array#new(arr)
copies arr
-- that's the prior art we're relying on for allowing it in the constructor. But foo.a = a
rarely/never copies a
, which is why we don't (currently) do that either.
On the other hand you're not the first person to ask for this. Here is another:
https://github.com/google/protobuf/pull/4385
Let me poll a few other people and see what they think.
If I were to prioritize, I would place supporting RepeatedField and Map objects in the initializer method above supporting Array and Hash objects on attribute setter methods. I find this behavior the most surprising and don't understand why this is not allowed.
Even though it is the lower priority in my mind, I would still like to see supporting Array and Hash objects on the attribute setter methods. Even with the understanding that the Array/Hash objects are not stored by the method, but mapped to the protobuf types in the class definition. I would offer that this edge case can be adequately explained in the attribute setter documentation so users would not be as confused by the behavior.
To be clear, mapping Array/Hash objects to RepeatedField/Map objects does challenge some ruby semantics. For example, Ruby parses and treats calls methods that end with =
differently than other methods:
class Foo
def bar
@bar
end
def bar=(new_bar)
@bar = new_bar.reverse
end
end
foo = Foo.new
# Chain calls
left_hand_bar = foo.bar = "bar"
# The foo object has the "mapped" value
foo.bar #=> "rab"
# The left-hand object has the original right-hand value
left_hand_bar #=> "bar"
That said, I still think the convenience of mapping the values to the correct types is worth the properly documented tweak to Ruby idioms.
@haberman any update?
I agree that there is no reason not to accept RepeatedField/Map in the constructor. Let me write a change to do that part.
Let me know if I can help, but I think #4385 would address this. I can help rebase/test if needed.
Here is another example of a user struggling with the unfriendliness of the current API: googleapis/google-cloud-ruby#2839
I agree that there is no reason not to accept RepeatedField/Map in the constructor. Let me write a change to do that part.
@haberman Are you still planning on authoring a change to allow this?
Can we help with this? This is a pretty unfriendly API.
I came here from https://github.com/googleapis/google-cloud-ruby/issues/1477, and still did not find an answer. Why is this so difficult?
How do I create a RepeatedField of a custom type? Just show me. Give me an example.
Ugh. Turns out the answer is just to create your message, and then push new items to that message's RepeatedField.
Here's a janky example:
Your proto is:
message InstrumentAttributes {
...
repeated TradingHours trading_schedule = 6;
...
}
You want to set trading_schedule
in your message. Reading the docs might make you think you need to call RepeatedField.new
, and if you were repeating a simple string
or int32
, you could do that. But you can't, and the docs' example code doesn't demonstrate anything else.
The docs do say that "The RepeatedField type supports all of the same methods as a regular Ruby Array", but I missed it (until revising this comment). I'm sure I can't be alone in that.
What you really want to do is this:
atts = MyNamespace::InstrumentAttributes.new
# atts.trading_hours is currently an empty RepeatedField
trading_hours1 = create_trading_hours() # returns a MyNamespace::TradingHours
trading_hours2 = create_trading_hours()
atts.trading_schedule.push(trading_hours1)
atts.trading_schedule.push(trading_hours2)
Now atts.trading_hours
will be an array of your 2 elements.
I still don't actually know how to create a RepeatedField
of MyNamespace::TradingHours
, but I guess I don't need to.
I came here from googleapis/google-cloud-ruby#1477, and still did not find an answer. Why is this so difficult?
How do I create a RepeatedField of a custom type? Just show me. Give me an example.
You mentioned me there. It was a year ago and I have probably already forgot some details there also the gem has probably been updated a lot. That old code with old gem dependencies "just works" since then for my monitoring needs that did not expand.
As far as I remember, the point of my code was that in the time_series.points
you have to define only the time_series
while points
is being defined automatically and you can just use it immediately to push values into it.
Maybe that snippet is not compatible with current gems -- it has their versions mentioned in comments if you scroll the snippet horizontally. I remember I've spent a day or two greping through libs and specs to come up with it so it's ok to find it to be difficult.
Sorry for the long inactivity on this issue. I agree that we should fix it, in both cases (constructor and assignment).
There is one unresolved question in my mind. Suppose you write:
msg1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
msg2 = A::B::C::TestMessage.new(repeated_int64: msg1.repeated_int64) # ArgumentError
msg1.repeated_int64 << 5
print(msg2.repeated_int64.size) # 4 or 5?
Should the last line print 4 or 5?
In other words, if you specify an existing RepeatedField
object in the constructor, should we deep copy it, or point to it by reference?
In other words, if you specify an existing
RepeatedField
object in the constructor, should we deep copy it, or point to it by reference?
I would expect a deep copy.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive
because the last activity was over 90 days ago.
This continues to be an issue, but I do think this implementation: https://github.com/protocolbuffers/protobuf/pull/4385 fills the need here?
I think the main things blocking this now are just:
- Deciding on a consistent set of semantics.
- Implementing those semantics.
- Waiting for a breaking change release (if the change ends up being breaking).
Point (1) mainly concerns the question of what is a deep copy vs a reference, ie. what I alluded to above in https://github.com/protocolbuffers/protobuf/issues/4969#issuecomment-1234677460. I think we want to avoid doing deep copies for every kind of assignment, because people frequently want to do things like build a message tree from the leaves up. If every msg.submsg = submsg
assignment was a deep copy, leaves-up builds would be really unnecessarily expensive.
Probably the simplest resolution would be to allow assignments that are currently rejected, like msg.repeated_field = ruby_array
, and to only deep copy when performing this kind of implicit type conversion. That could also avoid making this a breaking change, since we wouldn't be changing the semantics of any existing (working) code.