jackson-dataformats-binary icon indicating copy to clipboard operation
jackson-dataformats-binary copied to clipboard

writeRaw binary content

Open rashtao opened this issue 1 year ago • 6 comments

What is the recommended way to write raw binary content? As far as I can see com.fasterxml.jackson.core.JsonGenerator#writeRaw() variants only support String/char[]/SerializableString arguments, which are not suitable for binary formats.

Is there any plan to extend it and accept binary arguments, i.e. byte[]/InputStream?

rashtao avatar Jul 14 '22 10:07 rashtao

JsonGenerator.writeBinary()? It only uses Base64 encoding if the underlying format does not have native binary values and binary formats typically do.

I do not recommend writeRaw() for any use as that requires caller to know a lot about underlying format (ditto for writeBinaryValue() that is slightly less dangerous); especially so with binary formats.

Also note that the answer may depend on specific format backend, at least when using last-effort methods like writeRaw().

cowtowncoder avatar Jul 15 '22 00:07 cowtowncoder

I mean specifically writeRaw(), in order to append an already properly encoded byte sequence. I would like to achieve the same behavior of writeRaw(String|char[]|SerializableString) for binary dataformats. Of course not all format backends can support this, but at least Smile could support this.

Currently, if I have an already encoded valid byte array (e.g. in Smile format), in order to append it, I need first to parse it (i.e. with JsonParser#readValueAsTree() and then write it (i.e. with JsonGenerator#writeTree()), which consumes unneeded cpu and memory resources.

rashtao avatar Jul 15 '22 07:07 rashtao

Ok. That makes sense then. At this points, PRs would be welcome and/or specific per-format tickets to denote where exactly functionality is missing.

As a work-around one can usually hold on to OutputStream used (and use generator.flush()) but support for writeRaw() would definitely be cleaner as it can auto-flush.

cowtowncoder avatar Jul 15 '22 15:07 cowtowncoder

Thanks!

I think it would be even better adding new methods to com.fasterxml.jackson.core.JsonGenerator, e.g. writeRaw((byte[]|InputStream) data).

This would allow further enhancements at databind level by adding support for byte[] value to com.fasterxml.jackson.databind.util.RawValue and will ultimately enable the capability to map binary raw values in databind API, i.e.:

  • in all usages of raw values in com.fasterxml.jackson.databind.node (eg. com.fasterxml.jackson.databind.node.ObjectNode#putRawValue())
  • supporting byte[] property value for @JsonRawValue annotation
  • serializing RawValue fields in binary data formats backends

What is your opinion about it?

rashtao avatar Jul 18 '22 12:07 rashtao

From API perspective it is sort of tricky, given that now we have set of methods that work/don't-work on different backends (text vs binary). I specifically would be against supporting binary-content - to -text-format (which some users would no doubt want :) ) because it could or could not work based on backend (not for Writer, yes for OutputStream).

But then again without this, writeRaw() variants are useless for binary content and one has to "merge" content with lower level constructs.

I think my initial thinking is that I would want to defer variant(s) that take streams, and start with just byte[] (with possible addition of ByteBuffer). API could be added in 2.14, with default implementation in JsonGenerator that throws an exception indicating backend does not support operation; but with implementation for at least one binary backend to verify usage. We would want both writeRaw(byte[]) and writeRawValue(byte[]).

We might also need to either use an existing StreamWriteCapability (CAN_WRITE_BINARY_NATIVELY) or add new one(s) in case code needs to decide on text-or-binary handling for some raw values.

It might make sense to divide into first supporting low-level (JsonGenerator) functionality, test that, and then once merged, figure out how to make things work via POJOs and JsonNode. Looks like RawValue would need improvements; and existing RawSerializer wouldn't quite work as-is either.

I probably won't have tons of time to work on this myself but if you or anyone else has time, I would find time to code review and help with work.

cowtowncoder avatar Jul 18 '22 17:07 cowtowncoder

Sounds good to me, thanks for sharing! I will draft a PR as I find time. Should we move the discussion to jackson-core project in order to offer better visibility and gather further feedback?

rashtao avatar Jul 19 '22 09:07 rashtao