protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

[Ruby] `#to_h` and `#to_json` with Hash/Array/Struct etc returns `{"descriptor":[{}...{}]}` when `active_support/core_ext/object/json` loaded

Open jufemaiz opened this issue 1 year ago • 6 comments

What version of protobuf and what language are you using?

Version: main/v3.27.0+ etc. Language: Ruby

What operating system (Linux, Windows, ...) and version?

MacOS 14.5

What runtime / compiler are you using (e.g., python version or gcc version) Ruby v3.2

What did you do?

message Thing {
  string whatsit = 1;
}

Build a library with it (or anything else). Then include it and build something containing a protobuf message. For example:

require 'active_support'
require 'active_support/core_ext/object/json'

...

thing_proto = Thing.new(whatsit: "Something random")

ThingInAThing = Struct.new(thing:)

thing_struct = ThingInAThing.new(thing: thing_proto)
thing_hash = { thing: thing_proto }
thing_array = [ thing_proto ]

A key thing to note is this quote from Matz a decade ago:

Ruby often has two conversion methods for an object, e.g. #to_s and #to_str, #to_i and #to_int, #to_h and #to_hash. The former is used for explicit conversion, the latter is used for implicit conversion (from an object with identical method signature, for example proxy). Ref: https://bugs.ruby-lang.org/issues/10180

ActiveSupport rightfully makes use of the :to_hash method for implicitly transforming an Object to JSON format. Ref: https://github.com/rails/rails/blame/v7.1.3.4/activesupport/lib/active_support/core_ext/object/json.rb#L58-L66

Unfortunately attempting to call #to_hash as opposed to #to_h fails on protobuf messages. Aliasing #to_h for #to_hash calls seems a sensible option here in the absence of shifting the #to_json method to also make use of #to_hash as Matz implies is correct.

What did you expect to see

First, when calling #to_h expect to see the marshalled hash even when nested. That is:

thing_struct.to_h => { thing: { whatsit: "Something random" } }
thing_hash.to_h => { thing: { whatsit: "Something random" } }

Second, when calling #to_json expect to see the marshalled json, even when nested. That is:

thing_struct.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_hash.to_json => "{\"thing\":{\"whatsit\":\"Something random\"}}"
thing_array.to_json => "[{\"whatsit\":\"Something random\"}]"

What did you see instead?

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

When calling #to_h:

thing_struct.to_h => { thing: <Thing: whatsit: "Something Random"> }
thing_hash.to_h => { thing: <Thing: whatsit: "Something Random"> }

When calling #to_json:

thing_struct.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_hash.to_json => "{\"thing\":{\"descriptor\":[{}]}}"
thing_array.to_json => "[{\"descriptor\":[{}]}]"

Anything else we should know about your project / environment

Possibly related to https://github.com/protocolbuffers/protobuf/issues/9500

jufemaiz avatar Jun 05 '24 12:06 jufemaiz

Hi @jufemaiz, can you clarify exactly what the proposed resolution is here?

Would that change break any existing use cases?

haberman avatar Jun 21 '24 18:06 haberman

There's two options, with one that I would lean towards due to Matz's language design.

  1. (preferred): protobuf to add #to_hash as the default internally consistent method, but also expose #to_h for direct use (whether this differs is a discussion more broadly if there is need for additional fields visible internalls).
  2. (alternative): protobuf adds #to_hash as an alias of #to_h.

Digging into this further, https://github.com/protocolbuffers/protobuf/blob/0302c4c43821ac893e8f1071576f80edef5c6398/ruby/lib/google/protobuf/ffi/internal/convert.rb#L190-L210 there's already a non-private #to_h implementation (that #to_h sometimes (?) directly calls), however the conventions of ruby are not being adopted, causing issues elsewhere there is an expectation of convention.

jufemaiz avatar Jun 22 '24 00:06 jufemaiz

Please note that the ffi/ directory is an experimental implementation that is not currently the default. The default implementation is written in C: https://github.com/protocolbuffers/protobuf/blob/8d8db9cea8eb442a57286f33ec1e802c5d6c3149/ruby/ext/google/protobuf_c/message.c#L831-L840

I don't quite understand how options (1) and (2) are different. What's an example of how we might want #to_hash and #to_h to differ?

haberman avatar Jun 24 '24 17:06 haberman

Ah fair enough on the ffi/ directory.

I am not sure why, but there may be reasons that the core team want a, by convention, internal function to differ from the external function. If there's no reason, aliasing one way or another would resolve the issue – and given the lack of implementation of #to_hash I cannot imagine there would be a major issue in doing so.

jufemaiz avatar Jun 25 '24 00:06 jufemaiz

Introducing #to_hash as an alias of #to_h seems reasonable to me.

haberman avatar Jun 25 '24 14:06 haberman

How would you like to proceed on this? Changes to the c code?

jufemaiz avatar Jun 26 '24 02:06 jufemaiz

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Sep 24 '24 10:09 github-actions[bot]

Ideally this could be done at the Ruby level, with a method alias like some of the ones we have here: https://github.com/protocolbuffers/protobuf/blob/main/ruby/lib/google/protobuf/message_exts.rb Does that seem viable?

That would save us from having to duplicate this between the different backends we currently have (C, FFI, JRuby).

haberman avatar Oct 07 '24 04:10 haberman

Ideally this could be done at the Ruby level, with a method alias like some of the ones we have here: https://github.com/protocolbuffers/protobuf/blob/main/ruby/lib/google/protobuf/message_exts.rb Does that seem viable?

That would save us from having to duplicate this between the different backends we currently have (C, FFI, JRuby).

Are you proposing monkey-patching? Or simply adding an alias method for the notionally internal #to_hash to utilise the notionally external #to_h in lieu of whatever is being used at the moment? (Apologies for the delay)

# Protocol Buffers - Google's data interchange format
# Copyright 2008 Google Inc.  All rights reserved.
#
# Use of this source code is governed by a BSD-style
# license that can be found in the LICENSE file or at
# https://developers.google.com/open-source/licenses/bsd

module Google
  module Protobuf
    module MessageExts

      #this is only called in jruby; mri loades the ClassMethods differently
      def self.included(klass)
        klass.extend(ClassMethods)
      end

      module ClassMethods
      end

      def to_hash(options = {})
        self.to_h
      end

      def to_json(options = {})
        self.class.encode_json(self, options)
      end

      def to_proto(options = {})
        self.class.encode(self, options)
      end

    end
    class AbstractMessage
      include MessageExts
      extend MessageExts::ClassMethods
    end
    private_constant :AbstractMessage
  end
end

jufemaiz avatar Jan 03 '25 06:01 jufemaiz

Following this one up with someone on the core team.

jufemaiz avatar Feb 23 '25 22:02 jufemaiz

Wow. Weird. No idea how but the "closed" PR above has actually been merged.

  • https://github.com/protocolbuffers/protobuf/pull/20866 <-- closed
  • https://github.com/protocolbuffers/protobuf/blame/main/ruby/lib/google/protobuf/message_exts.rb <-- main with lines 28-31 referencing the commit message

jufemaiz avatar Mar 31 '25 02:03 jufemaiz