protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

[Ruby] Setting Array/Hash values is inconsistent/buggy

Open blowmage opened this issue 6 years ago • 23 comments

Hi, the behavior in this issue describes has been consistent for all gem releases since 3.0.0.alpha, on all ruby versions, AFAIK. We have worked around it, but I think it can be better, especially as more and more gems are moving to being generated only instead of hand-written, and exposing the protobuf objects directly to users.

The ruby gem provides support for Ruby Array and Hash objects. It will automatically map an Array object to a protobuf RepeatedField object, and a Hash object to a protobuf Map object. Here are some examples using the TestMessage protobuf class used in the ruby tests:

msg_by_array = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
msg_by_array.repeated_int64.class #=> Google::Protobuf::RepeatedField
msg_by_array.repeated_int64.to_a #=> [1, 2, 3, 4]

msg_by_hash = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
msg_by_hash.map_string_string.class #=> Google::Protobuf::Map
msg_by_hash.map_string_string.to_h #=> { 'hello' => 'world' }

This is great! Except you can't set a value on an instantiated protobuf object using Array or Hash, because it raises TypeError:

msg_by_array.repeated_int64 = [1, 2, 3, 4] # TypeError
# TypeError: Expected repeated field array

msg_by_hash.map_string_string = { 'hello' => 'world' } # TypeError
# TypeError: Expected Map instance

You also can't set the value in the initializer with a protobuf object, or you get an ArgumentError:

msg_by_rf = A::B::C::TestMessage.new(repeated_int64: msg_by_array.repeated_int64) # ArgumentError
# ArgumentError: Expected array as initializer value for repeated field 'repeated_int64'.

msg_by_map = A::B::C::TestMessage.new(map_string_string: msg_by_hash.map_string_string) # ArgumentError
# ArgumentError: Expected Hash object as initializer value for map field 'map_string_string'.

This causes us to perform many workarounds, which for the most part is manageable. But, in general this doesn't seem to be very usable, as it requires users to run into errors more often than they expect.

I've created the following (failing) test file that demonstrates these concepts. It can be dropped into ruby/test and should run just fine.

#!/usr/bin/ruby

require 'generated_code_pb'
require 'test/unit'

class MismatchedSettersTest < Test::Unit::TestCase

  def test_setting_Array_on_initializer
    m = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
    assert_kind_of(Google::Protobuf::RepeatedField, m.repeated_int64)
    assert_equal([1, 2, 3, 4], m.repeated_int64.to_a)
  end

  def test_setting_RepeatedField_on_initializer
    m1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
    assert_kind_of(Google::Protobuf::RepeatedField, m1.repeated_int64)
    assert_equal([1, 2, 3, 4], m1.repeated_int64.to_a)

    # This fails with ArgumentError: Expected array as initializer value for repeated field 'repeated_int64'.
    # because it can't take a Google::Protobuf::RepeatedField
    m2 = A::B::C::TestMessage.new(repeated_int64: m1.repeated_int64)
    assert_kind_of(Google::Protobuf::RepeatedField, m2.repeated_int64)
    assert_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
  end

  def test_setting_Array_on_accessor
    m = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::RepeatedField, m.repeated_int64)
    assert_not_equal([1, 2, 3, 4], m.repeated_int64.to_a)
    # This fails with TypeError: Expected repeated field array
    # because it expects the value to be Google::Protobuf::RepeatedField
    m.repeated_int64 = [1, 2, 3, 4]
    assert_equal([1, 2, 3, 4], m.repeated_int64.to_a)
  end

  def test_setting_RepeatedField_on_accessor
    m1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
    assert_kind_of(Google::Protobuf::RepeatedField, m1.repeated_int64)
    assert_equal([1, 2, 3, 4], m1.repeated_int64.to_a)

    m2 = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::RepeatedField, m2.repeated_int64)
    assert_not_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
    m2.repeated_int64 = m1.repeated_int64
    assert_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
  end

  def test_setting_Hash_on_initializer
    m = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
    assert_kind_of(Google::Protobuf::Map, m.map_string_string)
    assert_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
  end

  def test_setting_Map_on_initializer
    m1 = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
    assert_kind_of(Google::Protobuf::Map, m1.map_string_string)
    assert_equal({ 'hello' => 'world' }, m1.map_string_string.to_h)

    # This fails with ArgumentError: Expected Hash object as initializer value for map field 'map_string_string'.
    # because it can't take a Google::Protobuf::Map
    m2 = A::B::C::TestMessage.new(map_string_string: m1.map_string_string)
    assert_kind_of(Google::Protobuf::Map, m2.map_string_string)
    assert_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
  end

  def test_setting_Hash_on_accessor
    m = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::Map, m.map_string_string)
    assert_not_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
    # This fails with TypeError: Expected Map instance
    # because it expects the value to be Google::Protobuf::Map
    m.map_string_string = { 'hello' => 'world' }
    assert_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
  end

  def test_setting_Map_on_accessor
    m1 = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
    assert_kind_of(Google::Protobuf::Map, m1.map_string_string)
    assert_equal({ 'hello' => 'world' }, m1.map_string_string.to_h)

    m2 = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::Map, m2.map_string_string)
    assert_not_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
    m2.map_string_string = m1.map_string_string
    assert_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
  end

end

blowmage avatar Jul 26 '18 19:07 blowmage

What is this issue looking for? Setting repeated/map fields with ruby array/map?

TeBoring avatar Jul 26 '18 20:07 TeBoring

Or constructor with protobuf object?

TeBoring avatar Jul 26 '18 21:07 TeBoring

Being able to assign an Array or Hash to an attribute, and assign a RepeatedField or Map in an initializer. Basically, make the failing tests pass.

blowmage avatar Jul 26 '18 21:07 blowmage

In your test, I also see you check the result of to_a and to_h. Do you also need these features? @haberman do we want to support assign an Array or Hash to an attribute, and assign a RepeatedField or Map in an initializer? And do we guarantee the result of to_a and to_h?

TeBoring avatar Jul 26 '18 21:07 TeBoring

This behavior is the result of some deliberate thinking/design. Basically we try to give predictable behavior, and avoid surprises. And we try to behave like Struct generally.

The tricky issue is that, to do proper type checking, we can't just take a reference to the existing array. Consider:

arr = [1, 2, 3, 4]
msg_by_array.repeated_int64 = arr

# Whoops, now our repeated int64 field has a string in it.
arr << "Broccoli"

Instead we need a separate array-like object, RepeatedField, that can enforce that all of the members have the correct type.

It would be possible to make field assignment on a protobuf do the equivalent of:

# msg_by_array.repeated_int64 = [1, 2, 3, 4]
msg_by_array.repeated_int64.replace([1, 2, 3, 4])

But this is un-Struct like, and would cause surprises if someone wrote:

arr = [1, 2, 3, 4]
msg_by_array.repeated_int64 = arr

# Whoops, won't affect msg_by_array.repeated_int64
# even though with Struct it would.
arr << 5 

To maintain consistency here, we decided not to allow direct assignment. Both of the following work as alternatives:

# Append
msg_by_array.repeated_int64 += [1, 2, 3, 4]

# Replace/assign.
msg_by_array.repeated_int64.replace([1, 2, 3, 4])

You could argue that we've already broken this model by allowing the following:

msg_by_array = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])

But I see that as more akin to how Array#new accepts an array parameter which it copies into the new array.

All that said, it's possible our efforts at consistency here are annoying people more than they are saving people from frustrating surprises. We just want to be cautious here, because "copy on assignment" is different than how almost everything else in Ruby works. You would never expect this to be doing any copying:

Struct.new("Foo", :a)
foo = Struct::Foo.new

# This will never copy a.
foo.a = a

haberman avatar Jul 26 '18 22:07 haberman

I guess what I'm saying here is that if we can accept an Array in an initializer method, with all the appropriate type checking, then should be able to do the same on an attribute setter method. Similarly, if we can accept a ReleatedField on an attribute setter method, then should be able to accept it in an initializer method. Why is the behavior of initializer methods and attribute setter methods different?

We have seen developers both outside and inside Google bump into this.

blowmage avatar Jul 26 '18 22:07 blowmage

Why is the behavior of initializer methods and attribute setter methods different?

Because Array#new(arr) copies arr -- that's the prior art we're relying on for allowing it in the constructor. But foo.a = a rarely/never copies a, which is why we don't (currently) do that either.

haberman avatar Jul 26 '18 22:07 haberman

On the other hand you're not the first person to ask for this. Here is another:

https://github.com/google/protobuf/pull/4385

Let me poll a few other people and see what they think.

haberman avatar Jul 30 '18 20:07 haberman

If I were to prioritize, I would place supporting RepeatedField and Map objects in the initializer method above supporting Array and Hash objects on attribute setter methods. I find this behavior the most surprising and don't understand why this is not allowed.

Even though it is the lower priority in my mind, I would still like to see supporting Array and Hash objects on the attribute setter methods. Even with the understanding that the Array/Hash objects are not stored by the method, but mapped to the protobuf types in the class definition. I would offer that this edge case can be adequately explained in the attribute setter documentation so users would not be as confused by the behavior.

To be clear, mapping Array/Hash objects to RepeatedField/Map objects does challenge some ruby semantics. For example, Ruby parses and treats calls methods that end with = differently than other methods:

class Foo
  def bar
    @bar
  end

  def bar=(new_bar)
    @bar = new_bar.reverse
  end
end

foo = Foo.new
# Chain calls
left_hand_bar = foo.bar = "bar"
# The foo object has the "mapped" value
foo.bar #=> "rab"
# The left-hand object has the original right-hand value
left_hand_bar #=> "bar"

That said, I still think the convenience of mapping the values to the correct types is worth the properly documented tweak to Ruby idioms.

blowmage avatar Jul 30 '18 21:07 blowmage

@haberman any update?

TeBoring avatar Aug 06 '18 21:08 TeBoring

I agree that there is no reason not to accept RepeatedField/Map in the constructor. Let me write a change to do that part.

haberman avatar Aug 09 '18 22:08 haberman

Let me know if I can help, but I think #4385 would address this. I can help rebase/test if needed.

ebenoist avatar Dec 11 '18 01:12 ebenoist

Here is another example of a user struggling with the unfriendliness of the current API: googleapis/google-cloud-ruby#2839

blowmage avatar Jan 23 '19 23:01 blowmage

I agree that there is no reason not to accept RepeatedField/Map in the constructor. Let me write a change to do that part.

@haberman Are you still planning on authoring a change to allow this?

blowmage avatar Feb 14 '19 19:02 blowmage

Can we help with this? This is a pretty unfriendly API.

bobbytables avatar Sep 09 '19 22:09 bobbytables

I came here from https://github.com/googleapis/google-cloud-ruby/issues/1477, and still did not find an answer. Why is this so difficult?

How do I create a RepeatedField of a custom type? Just show me. Give me an example.

gbirchmeier avatar Sep 20 '19 18:09 gbirchmeier

Ugh. Turns out the answer is just to create your message, and then push new items to that message's RepeatedField.

Here's a janky example:

Your proto is:

message InstrumentAttributes {
    ...
    repeated TradingHours trading_schedule = 6;
    ...
}

You want to set trading_schedule in your message. Reading the docs might make you think you need to call RepeatedField.new, and if you were repeating a simple string or int32, you could do that. But you can't, and the docs' example code doesn't demonstrate anything else.

The docs do say that "The RepeatedField type supports all of the same methods as a regular Ruby Array", but I missed it (until revising this comment). I'm sure I can't be alone in that.

What you really want to do is this:

atts = MyNamespace::InstrumentAttributes.new
# atts.trading_hours is currently an empty RepeatedField

trading_hours1 = create_trading_hours() # returns a MyNamespace::TradingHours
trading_hours2 = create_trading_hours()
atts.trading_schedule.push(trading_hours1)
atts.trading_schedule.push(trading_hours2)

Now atts.trading_hours will be an array of your 2 elements.

I still don't actually know how to create a RepeatedField of MyNamespace::TradingHours, but I guess I don't need to.

gbirchmeier avatar Sep 20 '19 18:09 gbirchmeier

I came here from googleapis/google-cloud-ruby#1477, and still did not find an answer. Why is this so difficult?

How do I create a RepeatedField of a custom type? Just show me. Give me an example.

You mentioned me there. It was a year ago and I have probably already forgot some details there also the gem has probably been updated a lot. That old code with old gem dependencies "just works" since then for my monitoring needs that did not expand.

As far as I remember, the point of my code was that in the time_series.points you have to define only the time_series while points is being defined automatically and you can just use it immediately to push values into it.

Maybe that snippet is not compatible with current gems -- it has their versions mentioned in comments if you scroll the snippet horizontally. I remember I've spent a day or two greping through libs and specs to come up with it so it's ok to find it to be difficult.

Nakilon avatar Sep 20 '19 19:09 Nakilon

Sorry for the long inactivity on this issue. I agree that we should fix it, in both cases (constructor and assignment).

There is one unresolved question in my mind. Suppose you write:

msg1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
msg2 = A::B::C::TestMessage.new(repeated_int64: msg1.repeated_int64) # ArgumentError
msg1.repeated_int64 << 5
print(msg2.repeated_int64.size)  # 4 or 5?

Should the last line print 4 or 5?

In other words, if you specify an existing RepeatedField object in the constructor, should we deep copy it, or point to it by reference?

haberman avatar Sep 01 '22 19:09 haberman

In other words, if you specify an existing RepeatedField object in the constructor, should we deep copy it, or point to it by reference?

I would expect a deep copy.

blowmage avatar Sep 19 '22 11:09 blowmage

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Jun 02 '24 10:06 github-actions[bot]

This continues to be an issue, but I do think this implementation: https://github.com/protocolbuffers/protobuf/pull/4385 fills the need here?

benoist-folio avatar Jun 04 '24 14:06 benoist-folio

I think the main things blocking this now are just:

  1. Deciding on a consistent set of semantics.
  2. Implementing those semantics.
  3. Waiting for a breaking change release (if the change ends up being breaking).

Point (1) mainly concerns the question of what is a deep copy vs a reference, ie. what I alluded to above in https://github.com/protocolbuffers/protobuf/issues/4969#issuecomment-1234677460. I think we want to avoid doing deep copies for every kind of assignment, because people frequently want to do things like build a message tree from the leaves up. If every msg.submsg = submsg assignment was a deep copy, leaves-up builds would be really unnecessarily expensive.

Probably the simplest resolution would be to allow assignments that are currently rejected, like msg.repeated_field = ruby_array, and to only deep copy when performing this kind of implicit type conversion. That could also avoid making this a breaking change, since we wouldn't be changing the semantics of any existing (working) code.

haberman avatar Jun 21 '24 18:06 haberman