[C#] Expose unknown field sets
What language does this apply to? C#
Describe the problem you are trying to solve.
I'd like to log/ToString messages together with their unknown fields, so I can spot when the proto contract mismatches what the clients are sending.
Currently, each generated message has an _unknownFields field, however, it's private.
Also, the types that property holds, are internal.
Describe the solution you'd like
Expose UnknownFields on IMessage Expose UnknownField type (internal -> public) Add an option to ToString() to also dump unknown field values.
Describe alternatives you've considered
Accessing via reflection. I can access the field set via reflection fairly easily, however because UnknownField is an internal type, things get pretty ugly pretty quickly.
Additional context Add any other context or screenshots about the feature request here.
@googleberg Do other languages allow access to the unknown fields? I certainly don't want to just casually make things public - this would be a big decision, to be considered very carefully (including reviewing the API for flexibility of future changes, which is much easier while it's internal).
Oh, and even just "Expose UnknownFields on IMessage" would be a breaking change - that's definitely not something I'd want to do.
If we do this at all, I think I'd prefer to have some sort of "create a copy of the unknown field set" operation than allowing direct access to the unknown field set.
Other languages have this feature:
https://github.com/golang/protobuf/blob/16163b4f675b9ced03abc9157069d5dc76762213/reflect/protoreflect/value.go#L136-L148
https://protobuf.dev/reference/cpp/api-docs/google.protobuf.unknown_field_set/
https://googleapis.dev/python/protobuf/latest/google/protobuf/unknown_fields.html
I'd be happy even if the internal types were exposed, and there would be some static function that takes IMessage and returns a UnknownFieldSet or something like that.
My use case is mostly just printing unknown fields out (in case I forgot something), however because .ToString() does nesting and calls .ToString() on messages that are part of the top level message, and there is no easy way to also print unknown fields for inner messages, so I end up reimplementing pretty much of all of ToString functionality.
Alternative would be to allow roundtripping messages via json while preserving unknown fields, but that is a lot more unusual.
Just checked, and it's public in Java too. (That's my normal point of reference as the closest language to C#.)
Okay, I'll see when I can find some time to prototype this - but I'm not going to commit to any timelines at this point.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
Still relevant
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
Still relevant
also desired
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
relevant
is the issue still untriaged? It seems @jskeet removed it a year ago but GHA added it again
The generated _unknownFieldSet is still no accessible over protobuf reflection
Would need to access it to use it for a custom implicit patch operation.
Beside this, why is the _unknownFields always initialized?
UnknownFieldSet class has heap allock of a regular dictionary
makes 2 mendatory heap allocations for each protobuf message instance, for a very sparse used feature.
It would make more sense to make the _unknownFieldSetpublic of a faced Interface
and at the same time make it nullable.
Example:
public interface IWithUnknownFields
{
UnknownFieldSet? UnknownFields { get; }
}
can then be accessed with
IMessage msg;
if(msg is IWithUnknownFields m && m.UnknownFields is {} unknownFields) {
//use unknownFields
}
Beside this, why is the _unknownFields always initialized? UnknownFieldSet class has heap allock of a regular dictionary
makes 2 mendatory heap allocations for each protobuf message instance, for a very sparse used feature.
Not sure where you think it's always initialized to a non-null value, unless you're talking about some specific PR.
Currently when using the parameterless constructor, the field is left null. When using the copy constructor, it calls UnknownFieldSet.Clone which returns a null reference if the input is null.
Could you clarify the context in which there are heap allocations for unknown fields for each protobuf message instance?
@jskeet The generated message type does not [initialize the _unknownFields. I simply miss ready the code with the Clone and in c# nullable feature enabled.
I will try to work with c# reflection for now and to call UnknownFieldSet.MergeFrom manually.
To access the _unknownFields of a message on a supported way is still required.
It is required for Debug, Logging and for me to be implement a implicit patch
to merge the unknown fields to the destination instance.
I'm aware of the feature request as originally stated, and I still don't personally have time to work on it - I was just alarmed by the allocation claims made in your previous comment, and wanted to check that they were in fact mistaken, that I hadn't missed something.
Posting this in case someone else needs this. My goal was to print unknown fields in string/json representations of objects to inspect if I missed something while implementing.
For now, I've worked around this by using reflection to access the private fields to get this going. Note this was mostly written using an LLM, but seems to work correctly for nested fields.
Code
// ProtobufJsonPrinter.cs
// Emits VALID JSON for any Google.Protobuf IMessage, including *unknown fields*,
// recursively. No type names in output; empty messages render as {}.
// Unknowns are grouped under "unknown_fields" as:
// "<field-id> varint": [..numbers..]
// "<field-id> fixed64": [..numbers..]
// "<field-id> fixed32": [..numbers..]
// "<field-id> length_delimited": ["0xHEX", ...]
// "<field-id> group": [ { ...nested unknowns... }, ... ]
//
// Works even if UnknownFieldSet/UnknownField members are private (uses reflection).
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Text;
using Google.Protobuf;
public static class ProtobufDeepPrinter
{
private const bool AddNewLines = true;
public static string Dump(this IMessage message, int maxDepth = 128)
{
var sb = new StringBuilder();
AppendMessageJson(sb, message, 0, maxDepth);
return sb.ToString();
}
// ===== Core =====
private static void AppendMessageJson(StringBuilder sb, IMessage msg, int indent, int maxDepth)
{
if (msg == null) { sb.Append("null"); return; }
if (maxDepth < 0) { sb.Append("\"…\""); return; }
var desc = msg.Descriptor;
var unknowns = GetUnknownFieldSet(msg);
// Are there any known fields present?
bool hasKnown = desc.Fields.InDeclarationOrder()
.Any(f => f.IsRepeated || f.IsMap
? EnumerableHasAny(f.Accessor.GetValue(msg))
: f.Accessor.HasValue(msg));
// Any unknowns?
var fieldsDict = unknowns != null ? GetPrivateField<IDictionary>(unknowns, "fields") : null;
bool hasUnknown = fieldsDict != null && fieldsDict.Count > 0;
if (!hasKnown && !hasUnknown)
{
sb.Append("{}");
return;
}
sb.Append('{'); NewLine(sb, ++indent);
bool firstProp = true;
// Known fields
foreach (var field in desc.Fields.InDeclarationOrder())
{
var acc = field.Accessor;
var val = acc.GetValue(msg);
bool present = field.IsRepeated || field.IsMap ? EnumerableHasAny(val) : acc.HasValue(msg);
if (!present) continue;
if (!firstProp) { sb.Append(','); NewLine(sb, indent); } firstProp = false;
Quote(sb, field.JsonName ?? field.Name); sb.Append(": ");
if (field.IsMap)
{
// Map<K,V> → JSON object: key -> value
sb.Append('{'); NewLine(sb, indent + 1);
bool first = true;
foreach (var entry in (IEnumerable)val)
{
var t = entry.GetType();
var key = t.GetProperty("Key")!.GetValue(entry);
var value = t.GetProperty("Value")!.GetValue(entry);
if (!first) { sb.Append(','); NewLine(sb, indent + 1); } first = false;
Quote(sb, key?.ToString() ?? "null"); sb.Append(": ");
AppendValueJson(sb, value, indent + 1, maxDepth - 1);
}
NewLine(sb, indent); sb.Append('}');
}
else if (field.IsRepeated)
{
sb.Append('[');
bool first = true;
foreach (var item in (IEnumerable)val)
{
if (!first) sb.Append(',');
if (AddNewLines) NewLine(sb, indent + 1);
AppendValueJson(sb, item, indent + 1, maxDepth - 1);
first = false;
}
if (!first && AddNewLines) { NewLine(sb, indent); }
sb.Append(']');
}
else
{
AppendValueJson(sb, val, indent, maxDepth - 1);
}
}
// Unknown fields
if (hasUnknown)
{
if (!firstProp) { sb.Append(','); NewLine(sb, indent); } firstProp = false;
Quote(sb, "unknown_fields"); sb.Append(": ");
AppendUnknownFieldSetJson(sb, fieldsDict!, indent, maxDepth - 1);
}
NewLine(sb, --indent); sb.Append('}');
}
private static void AppendValueJson(StringBuilder sb, object value, int indent, int maxDepth)
{
switch (value)
{
case null:
sb.Append("null"); break;
case string s:
Quote(sb, s); break;
case bool b:
sb.Append(b ? "true" : "false"); break;
case ByteString bs:
Quote(sb, bs.ToBase64()); break;
case float f:
sb.Append(f.ToString("R", System.Globalization.CultureInfo.InvariantCulture)); break;
case double d:
sb.Append(d.ToString("R", System.Globalization.CultureInfo.InvariantCulture)); break;
case Enum _:
Quote(sb, value.ToString()); break; // enum names as strings
case IMessage m:
AppendMessageJson(sb, m, indent, maxDepth); break;
default:
if (IsNumeric(value))
{
sb.Append(Convert.ToString(value, System.Globalization.CultureInfo.InvariantCulture));
}
else
{
// Fallback: string-escape whatever it is
Quote(sb, value.ToString());
}
break;
}
}
// ===== Unknowns as JSON =====
private static void AppendUnknownFieldSetJson(StringBuilder sb, IDictionary fieldsDict, int indent, int maxDepth)
{
// Aggregate per "<id> <wire>" with arrays (to avoid duplicate JSON keys).
var bucket = new SortedDictionary<string, List<object>>();
foreach (DictionaryEntry de in fieldsDict)
{
int id = (int)de.Key;
var field = de.Value;
// varint (wire 0) IEnumerable<ulong>
foreach (ulong v in EnumerateList<ulong>(field, "varintList"))
Add(bucket, $"{id} varint", v);
// fixed64 (wire 1) IEnumerable<ulong>
foreach (ulong v in EnumerateList<ulong>(field, "fixed64List"))
Add(bucket, $"{id} fixed64", v);
// length-delimited (wire 2) IEnumerable<ByteString>
foreach (ByteString bs in EnumerateList<ByteString>(field, "lengthDelimitedList"))
Add(bucket, $"{id} length_delimited", bs.ToBase64());
// group (wire 3/4) IEnumerable<UnknownFieldSet>
foreach (var g in EnumerateList<object>(field, "groupList"))
{
// Serialize nested unknowns only (groups themselves are unknowns)
var nested = SerializeUnknownSetToJsonObject(g, indent + 1, maxDepth - 1);
Add(bucket, $"{id} group", nested);
}
// fixed32 (wire 5) IEnumerable<uint>
foreach (uint v in EnumerateList<uint>(field, "fixed32List"))
Add(bucket, $"{id} fixed32", v);
}
// Emit JSON object
sb.Append('{'); NewLine(sb, indent + 1);
bool first = true;
foreach (var kv in bucket)
{
if (!first) { sb.Append(','); NewLine(sb, indent + 1); }
first = false;
Quote(sb, kv.Key); sb.Append(": ");
var list = kv.Value;
if (list.Count == 1)
{
AppendUnknownValueJson(sb, list[0], indent + 1, maxDepth);
}
else
{
sb.Append('[');
for (int i = 0; i < list.Count; i++)
{
if (i > 0) sb.Append(',');
if (AddNewLines) NewLine(sb, indent + 1);
AppendUnknownValueJson(sb, list[i], indent + 1, maxDepth);
}
if (list.Count > 0 && AddNewLines) { NewLine(sb, indent); }
sb.Append(']');
}
}
NewLine(sb, indent); sb.Append('}');
}
private static object SerializeUnknownSetToJsonObject(object unknownFieldSet, int indent, int maxDepth)
{
var dict = GetPrivateField<IDictionary>(unknownFieldSet, "fields");
if (dict == null || dict.Count == 0) return new Dictionary<string, object>(); // {}
// Build a compact string for nested unknowns by reusing the JSON emitter
var sb = new StringBuilder();
AppendUnknownFieldSetJson(sb, dict, indent, maxDepth);
return new RawJson(sb.ToString());
}
private static void AppendUnknownValueJson(StringBuilder sb, object val, int indent, int maxDepth)
{
switch (val)
{
case RawJson rj:
sb.Append(rj.Json); break;
case string s:
Quote(sb, s); break;
case ulong ul:
sb.Append(ul.ToString(System.Globalization.CultureInfo.InvariantCulture)); break;
case uint ui:
sb.Append(ui.ToString(System.Globalization.CultureInfo.InvariantCulture)); break;
default:
// Should not happen, but be safe
AppendValueJson(sb, val, indent, maxDepth); break;
}
}
// ===== Utils =====
private static bool EnumerableHasAny(object obj)
{
if (obj is IEnumerable en)
{
foreach (var _ in en) return true;
}
return false;
}
private static void Quote(StringBuilder sb, string s)
{
sb.Append('\"');
foreach (var ch in s)
{
switch (ch)
{
case '\"': sb.Append("\\\""); break;
case '\\': sb.Append("\\\\"); break;
case '\b': sb.Append("\\b"); break;
case '\f': sb.Append("\\f"); break;
case '\n': sb.Append("\\n"); break;
case '\r': sb.Append("\\r"); break;
case '\t': sb.Append("\\t"); break;
default:
if (char.IsControl(ch))
sb.Append("\\u").Append(((int)ch).ToString("X4"));
else
sb.Append(ch);
break;
}
}
sb.Append('\"');
}
private static void NewLine(StringBuilder sb, int indent)
{
sb.AppendLine();
sb.Append(' ', indent * 2);
}
private static bool IsNumeric(object v)
{
return v is sbyte || v is byte || v is short || v is ushort ||
v is int || v is uint || v is long || v is ulong ||
v is float || v is double || v is decimal;
}
private static T? GetPrivateField<T>(object obj, string name) where T : class
=> obj.GetType().GetField(name, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj) as T;
private static object? GetPropOrField(object obj, string propName, string backingFieldName)
{
var p = obj.GetType().GetProperty(propName, BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
if (p != null) return p.GetValue(obj);
var f = obj.GetType().GetField(backingFieldName, BindingFlags.Instance | BindingFlags.NonPublic);
return f?.GetValue(obj);
}
private static object? GetUnknownFieldSet(IMessage msg)
=> GetPropOrField(msg, "UnknownFields", "_unknownFields");
private static IEnumerable<T> EnumerateList<T>(object obj, string privateFieldName)
{
var list = GetPrivateField<IEnumerable>(obj, privateFieldName);
if (list == null) yield break;
foreach (var item in list)
if (item is T t) yield return t;
}
private static void Add(SortedDictionary<string, List<object>> dict, string key, object value)
{
if (!dict.TryGetValue(key, out var list))
dict[key] = list = new List<object>();
list.Add(value);
}
// wrapper to avoid double-encoding nested unknown groups
private sealed class RawJson
{
public string Json { get; }
public RawJson(string json) => Json = json;
}
}
Then the fields of UnknownFieldSet can not be accessed, not even in read-only mode.
I have an idea to extend an operation including FieldMask to fetch the current protobuf doc from a well-known location
and then be able to use/lookup the update_mask paths also for unknown fields.