[Ruby] `#to_h` and `#to_json` with Hash/Array/Struct etc returns `{"descriptor":[{}...{}]}` when `active_support/core_ext/object/json` loaded
What version of protobuf and what language are you using?
Version: main/v3.27.0+ etc. Language: Ruby
What operating system (Linux, Windows, ...) and version?
MacOS 14.5
What runtime / compiler are you using (e.g., python version or gcc version) Ruby v3.2
What did you do?
message Thing {
string whatsit = 1;
}
Build a library with it (or anything else). Then include it and build something containing a protobuf message. For example:
require 'active_support'
require 'active_support/core_ext/object/json'
...
thing_proto = Thing.new(whatsit: "Something random")
ThingInAThing = Struct.new(thing:)
thing_struct = ThingInAThing.new(thing: thing_proto)
thing_hash = { thing: thing_proto }
thing_array = [ thing_proto ]
A key thing to note is this quote from Matz a decade ago:
Ruby often has two conversion methods for an object, e.g.
#to_sand#to_str,#to_iand#to_int,#to_hand#to_hash. The former is used for explicit conversion, the latter is used for implicit conversion (from an object with identical method signature, for example proxy). Ref: https://bugs.ruby-lang.org/issues/10180
ActiveSupport rightfully makes use of the :to_hash method for implicitly transforming an Object to JSON format.
Ref: https://github.com/rails/rails/blame/v7.1.3.4/activesupport/lib/active_support/core_ext/object/json.rb#L58-L66
Unfortunately attempting to call #to_hash as opposed to #to_h fails on protobuf messages. Aliasing #to_h for #to_hash calls seems a sensible option here in the absence of shifting the #to_json method to also make use of #to_hash as Matz implies is correct.
What did you expect to see
First, when calling #to_h expect to see the marshalled hash even when nested. That is:
thing_struct.to_h => { thing: { whatsit: "Something random" } }
thing_hash.to_h => { thing: { whatsit: "Something random" } }
Second, when calling #to_json expect to see the marshalled json, even when nested. That is:
thing_struct.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_hash.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_array.to_json => "[{\"whatsit\":\"Something random\"}]"
What did you see instead?
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).
When calling #to_h:
thing_struct.to_h => { thing: <Thing: whatsit: "Something Random"> }
thing_hash.to_h => { thing: <Thing: whatsit: "Something Random"> }
When calling #to_json:
thing_struct.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_hash.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_array.to_json => "[{\"descriptor\":[{}]}]"
Anything else we should know about your project / environment
Possibly related to https://github.com/protocolbuffers/protobuf/issues/9500
Hi @jufemaiz, can you clarify exactly what the proposed resolution is here?
Would that change break any existing use cases?
There's two options, with one that I would lean towards due to Matz's language design.
- (preferred): protobuf to add
#to_hashas the default internally consistent method, but also expose#to_hfor direct use (whether this differs is a discussion more broadly if there is need for additional fields visible internalls). - (alternative): protobuf adds
#to_hashas an alias of#to_h.
Digging into this further, https://github.com/protocolbuffers/protobuf/blob/0302c4c43821ac893e8f1071576f80edef5c6398/ruby/lib/google/protobuf/ffi/internal/convert.rb#L190-L210 there's already a non-private #to_h implementation (that #to_h sometimes (?) directly calls), however the conventions of ruby are not being adopted, causing issues elsewhere there is an expectation of convention.
Please note that the ffi/ directory is an experimental implementation that is not currently the default. The default implementation is written in C: https://github.com/protocolbuffers/protobuf/blob/8d8db9cea8eb442a57286f33ec1e802c5d6c3149/ruby/ext/google/protobuf_c/message.c#L831-L840
I don't quite understand how options (1) and (2) are different. What's an example of how we might want #to_hash and #to_h to differ?
Ah fair enough on the ffi/ directory.
I am not sure why, but there may be reasons that the core team want a, by convention, internal function to differ from the external function. If there's no reason, aliasing one way or another would resolve the issue – and given the lack of implementation of #to_hash I cannot imagine there would be a major issue in doing so.
Introducing #to_hash as an alias of #to_h seems reasonable to me.
How would you like to proceed on this? Changes to the c code?
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
Ideally this could be done at the Ruby level, with a method alias like some of the ones we have here: https://github.com/protocolbuffers/protobuf/blob/main/ruby/lib/google/protobuf/message_exts.rb Does that seem viable?
That would save us from having to duplicate this between the different backends we currently have (C, FFI, JRuby).
Ideally this could be done at the Ruby level, with a method alias like some of the ones we have here: https://github.com/protocolbuffers/protobuf/blob/main/ruby/lib/google/protobuf/message_exts.rb Does that seem viable?
That would save us from having to duplicate this between the different backends we currently have (C, FFI, JRuby).
Are you proposing monkey-patching? Or simply adding an alias method for the notionally internal #to_hash to utilise the notionally external #to_h in lieu of whatever is being used at the moment? (Apologies for the delay)
# Protocol Buffers - Google's data interchange format
# Copyright 2008 Google Inc. All rights reserved.
#
# Use of this source code is governed by a BSD-style
# license that can be found in the LICENSE file or at
# https://developers.google.com/open-source/licenses/bsd
module Google
module Protobuf
module MessageExts
#this is only called in jruby; mri loades the ClassMethods differently
def self.included(klass)
klass.extend(ClassMethods)
end
module ClassMethods
end
def to_hash(options = {})
self.to_h
end
def to_json(options = {})
self.class.encode_json(self, options)
end
def to_proto(options = {})
self.class.encode(self, options)
end
end
class AbstractMessage
include MessageExts
extend MessageExts::ClassMethods
end
private_constant :AbstractMessage
end
end
Following this one up with someone on the core team.
Wow. Weird. No idea how but the "closed" PR above has actually been merged.
- https://github.com/protocolbuffers/protobuf/pull/20866 <-- closed
- https://github.com/protocolbuffers/protobuf/blame/main/ruby/lib/google/protobuf/message_exts.rb <-- main with lines 28-31 referencing the commit message