jackson-core icon indicating copy to clipboard operation
jackson-core copied to clipboard

Add 'readText()' method in JsonParser

Open cowtowncoder opened this issue 13 years ago • 2 comments

Current JsonParser.getText() requires reading of the whole JSON String value as String. While convenient, this may not be optimal when processing large payloads.

As an alternative method, there should be something like:

boolean readText(Writer w);

which would read JSON String value, and pass it using given Writer; but possibly in separate chunks, without aggregating it. This allows caller to do incremental processing and avoid potentially big temporary memory usage.

In addition, for non-blocking parser implementations, this method could do partial decoding, meaning that it would only parse part of textual value; return value indicating whether full contents (true) or partial content (false) was processed.

cowtowncoder avatar May 27 '12 04:05 cowtowncoder

I need to parse JSON with huge text fields (up to 500MB). Using the readText(Writer) methods still needs a lot of memory, because it reads the whole text field into memory.

Is there any plan to make this more efficient?

From code reading I would assume that giving the writer down to _finishString() could help here. Then the string finisher could use only one (some?) segment by writing it to the writer if it's full and reusing it.

MikePieperSer avatar Apr 29 '21 06:04 MikePieperSer

No one is working on this currently as far as I know; I do not have time to work on this now and probably not for a while (unless I'd need it myself for some reason). But anyone who wants to work on it would be more than welcome to do so!

And yes, lazy initial handling (only decoding opening quote) is intended to allow more efficient read+write operation like you suggest. There are multiple backends (byte-based UTF8, character/Reader-based, async) to consider, but implementation could be relatively simple if it just addresses 2 common ones (Reader/byte-based; maybe DataInput one -- async could not be supported anyway I suspect.

Put another way: the reason this one has not been tackled is not necessarily due to inherent complexity of implementing support when API already exists.

cowtowncoder avatar Apr 29 '21 17:04 cowtowncoder