Print nested protobuf message types correctly with `mcap list schemas`

Open ThatGeoGuy opened this issue 2 weeks ago • 1 comments

Description

Version: 0.0.58
Platform: linux

mcap list schemas does not list nested types within a message, which gives the false impression that schema are missing. This is related to #436, although I don't think that it is actually possible to return the original "schema source" in a perfectly copy-paste-able way.

I guess to elaborate further, when a file descriptor set is written to an MCAP for protobuf schemas, this usually denotes the total sum of messages and their dependencies. This is usually a set of something akin to the Java FileDescriptorProto or Rust's FileDescriptorProto.

At Tangram, we use nested messages as a way to namespace types, as well as to make sure we're compliant with the 1-1-1 convention. So whereas we might define a proto like this, when it is printed out via mcap list schemas, it prints the following:

$ mcap list schemas detections.mcap
id      name                                    encoding        data
1       tangram.detections.ImageMeasurements    protobuf        syntax = "proto3";

                                                                message tangram.Uuid {
                                                                  optional value string = 1;
                                                                }
                                                                message google.protobuf.Timestamp {
                                                                  optional seconds int64 = 1;
                                                                  optional nanos int32 = 2;
                                                                }
                                                                message tangram.SynchronizableMetadata {
                                                                  optional component_uuid .tangram.Uuid = 1;
                                                                  optional timestamp .google.protobuf.Timestamp = 2;
                                                                  optional sync_group_id uint64 = 3;
                                                                }
                                                                message foxglove.Point2 {
                                                                  optional x double = 1;
                                                                  optional y double = 2;
                                                                }
                                                                message tangram.detections.ImageMeasurements {
                                                                  optional metadata .tangram.SynchronizableMetadata = 1;
                                                                  optional observed_object_space .tangram.Uuid = 2;
                                                                  repeated data .tangram.detections.ImageMeasurements.ImageMeasurementData = 3;
                                                                }

Steps To Reproduce

I don't have a minimal example, since that relies a lot on:

which language's protobuf library you're using to encode an MCAP; AND
the schema itself, which needs to have internal message definitions

A simple enough schema that could replicate this is:

syntax = "proto3";

message Outer
{
    message Inner
    {
         string data = 1;
    }

    repeated Inner inners = 1;
}

I would expect that mcap list schemas would print out this nested Inner type as well. From what I can tell, all the data is actually encoded (albeit in binary) for these nested types. The only problem is that these aren't being printed internally inside the CLI tool.

Expected Behavior

I don't think you can necessarily print a perfect 1:1 representation of the original proto file as requested in #436 — in general you should not be copy and pasting those and should instead be just exporting the file descriptor set (as bytes) and generating any language-specific message types downstream of that. Since that data is structured and machine-consumable, copy and pasting the stringified output here feels like a (bad) workaround.

Despite this, it would be helpful for users to be able to print out all the nested types in addition to the top-level types and dependencies, along with the nested types' fields. Especially if one wanted to use mcap cat --json <FILE.MCAP> or if the user is working in a dynamic language like Python where this can be automatically managed by the underlying mcap library it can be useful to get a sense of what types to expect.

Dec 15 '25 19:12 ThatGeoGuy