Deserialization performance is poor for large products
Currently, as noted in the project page, we use an O(N*M) algorithm for deserializing products with N fields repeated a total of M times. In profiling the deserialization of a complex deeply-nested object, I found that we end up spending a huge amount of time in the resulting calls to CodedInputStream.skipField.
One way to eliminate this is to (1) create a map from field index to parser, (2) loop through fields, passing each to its appropriate parser, and (3) finally build the results of each parser. I have prepared a PR that does this, which I'll be submitting soon (once my previous PR, which is a dependency, is merged). My PR also reduces memory pressure for deserializing nested objects by eliminating allocation of extra byte buffers in all cases except coproduct parsing.