jackson-dataformats-text icon indicating copy to clipboard operation
jackson-dataformats-text copied to clipboard

`UTF8Reader` throws"Need to move partially decoded character; buffer not modifiable" when read only one chinese char

Open mrtaolili opened this issue 1 year ago • 7 comments

jackson-dataformat-csv 2.12.4

String msg = 
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaå’–";
        CsvSchema.Builder builder = new CsvSchema.Builder();
        builder.setColumnSeparator('\u0001');
        builder.setAllowComments(true);
        builder.setEscapeChar('\\');
        builder.setQuoteChar('\'');
        for (int i = 0; i < 44;  i++) {
            builder.addColumn(String.format("c%d", i), CsvSchema.ColumnType.STRING);
        }
        CsvSchema build = builder.build();
        ObjectReader with = (new CsvMapper()).readerFor(JsonNode.class).with(build);
        JsonNode o = (JsonNode) with.readValue(msg.getBytes(StandardCharsets.UTF_8));
        System.out.println(o);```


Exception in thread "main" java.lang.IllegalStateException: Need to move partially decoded character; buffer not modifiable
	at com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.loadMore(UTF8Reader.java:403)
	at com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:188)
	at com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.loadMore(CsvDecoder.java:443)
	at com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder._nextUnquotedString(CsvDecoder.java:766)
	at com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.nextString(CsvDecoder.java:716)
	at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:915)
	at com.fasterxml.jackson.dataformat.csv.CsvParser.nextFieldName(CsvParser.java:727)
	at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:269)
	at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:69)
	at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:16)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
	at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2033)
	at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1528)
	at org.example.RetainBitMapGet.main(RetainBitMapGet.java:26)

mrtaolili avatar Sep 21 '24 03:09 mrtaolili

image remove these line can fix the problem

mrtaolili avatar Sep 21 '24 04:09 mrtaolili

Ok thank you for reporting the problem!

Check is there for a reason so removing check would be dangerous (unless verifying it is not needed).

I think to note is that version 2.12 is rather old: latest released is 2.17.2. I'll need to see if I can reproduce the issue with it -- ideally you would have tried it out before submitting. But I can do it now, given you provided a reproduction (which is good!).

cowtowncoder avatar Sep 21 '24 23:09 cowtowncoder

Ok I can see the problem: the last character of input is a 3-byte UTF-8 character, and code is trying to load at least one more byte. This is not right as 3 bytes should be sufficient for decoding.

But not sure of an easy way to solve this; need to think about it a bit.

cowtowncoder avatar Sep 21 '24 23:09 cowtowncoder

Forgot to mention: 2.17.2 (and now 2.18.0) still fail, but give bit more useful exception message at least.

cowtowncoder avatar Sep 28 '24 02:09 cowtowncoder

snakeyaml 2.4 seems to fix the yaml issue

pjfanning avatar Feb 20 '25 13:02 pjfanning

As per #539, not applicable to TOML it seems?

EDIT: Although, similar to YAML, I suspect it is just a failure to trigger boundary condition as TOML module uses UTF8Reader with similar flaw as csv.

cowtowncoder avatar Feb 25 '25 00:02 cowtowncoder

No longer reproducible for YAML in 2.x after #537; still fails in 3.0. Leaving yaml label for now.

cowtowncoder avatar Feb 25 '25 00:02 cowtowncoder

Looks like YAML issue exists in 3.x. Caused by Jackson's tools.jackson.dataformat.yaml.UTF8Reader, method loadMore(), line 408:

    private boolean loadMore(int available) throws IOException
    {
        _byteCount += (_inputEnd - available);

        // Bytes that need to be moved to the beginning of buffer?
        if (available > 0) {
            if (_inputPtr > 0) {
                if (!canModifyBuffer()) {
                    // 15-Aug-2022, tatu: Occurs (only) if we have half-decoded UTF-8
                    //     characters; uncovered by:
                    //
                    // https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50036
                    //
                    // and need to be reported as IOException
                    if (_inputSource == null) {
                        throw new IOException(String.format(
"End-of-input after first %d byte(s) of a UTF-8 character: needed at least one more",
available));
                    }

cowtowncoder avatar Aug 07 '25 20:08 cowtowncoder

Actually still reproducible for YAML too in 2.x (for 2.20), with slightly changed offsets in test. :-(

cowtowncoder avatar Aug 07 '25 22:08 cowtowncoder

https://github.com/FasterXML/jackson-dataformats-text/pull/546 helped in CSV module. Can we do something similar in yaml module? Maybe also, toml module.

pjfanning avatar Aug 07 '25 22:08 pjfanning

@pjfanning Thanks! Great minds etc -- just did #571.

Good point wrt TOML, need to check that too.

cowtowncoder avatar Aug 07 '25 23:08 cowtowncoder

Fixing for TOML too; too bad reproduction missing but fix is the same.

cowtowncoder avatar Aug 07 '25 23:08 cowtowncoder