JSON-java icon indicating copy to clipboard operation
JSON-java copied to clipboard

JSON CDL is omitting last row when parsing CSV with an empty last column

Open jjanczur opened this issue 3 years ago • 7 comments

I was playing with parsing CSV files into JSON using CDL library from org.json.CDL and I observed a strange behavior.

When I pass a CSV with an empty last value to the toJSONArray method, it's omitting the last row.

package com.my.test;

import org.json.CDL;
import org.json.JSONException;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.junit.jupiter.MockitoExtension;

import java.util.List;

import static org.junit.jupiter.api.Assertions.*;

@ExtendWith(MockitoExtension.class)
class CDLTest {
    @Test
    void testCDL_empty_column_fail() throws JSONException {
        final String SIMPLE_CSV_EMPTY_COLUMN_STRING = "Number,Name,Empty\n" +
                "1,Allice,\n" +
                "2,Chris,\n" +
                "3,Bob,";

        List<Object> jsonObjects = CDL.toJSONArray(SIMPLE_CSV_EMPTY_COLUMN_STRING).toList();

        assertAll(
                () -> assertNotNull(jsonObjects),
                () -> assertEquals(3, jsonObjects.size())
        );
    }

}

It's failing as well for that String:

        final String SIMPLE_CSV_EMPTY_COLUMN_STRING = "Number,Name,Empty\n" +
                "1,Allice,not\n" +
                "2,Chris,empty\n" +
                "3,Bob,";

Do you have any idea how can I fix it? For learning purposes, I'd prefer to stick to this library and just fix it codewise. I was thinking about extending CDL and reimplementing method with a bug. Do you have any other idea, or maybe someone already fixed it?

Versions: <org.json.version>20190722</org.json.version> and Java 11

jjanczur avatar Apr 28 '22 15:04 jjanczur

is it an empty "last value" or is it the missing new line \n causing the issue?

For example, does the following data work:

final String SIMPLE_CSV_EMPTY_COLUMN_STRING = "Number,Name,Empty\n" +
                "1,Allice,\n" +
                "2,Chris,\n" +
                "3,Bob,\n";

johnjaylward avatar Apr 28 '22 16:04 johnjaylward

Hi @johnjaylward, When I add a new line CDL parses correctly, but CSV not necessarily needs to end with a new line, doesn't it?

The reason for this behavior is line 108 in CDL rowToJSONArray which assumes that the row needs something at the end:

            if (value == null ||
                    (ja.length() == 0 && value.length() == 0 && c != ',')) {

jjanczur avatar Apr 28 '22 16:04 jjanczur

I'm not sure about CSV specifically, but here is a good answer for text files in general: https://unix.stackexchange.com/a/18789

johnjaylward avatar Apr 28 '22 16:04 johnjaylward

looks like CSV itself is lose on the ending new-line: https://www.csvreader.com/csv_format.php

Feel free to submit a PR for a fix

johnjaylward avatar Apr 28 '22 16:04 johnjaylward

Feel free to submit a PR for a fix

The JavaDoc for the CDL class does specify that every row ends with a NEWLINE. So if you do make a change, please be sure to update any relevant JavaDoc as well.

johnjaylward avatar Apr 28 '22 16:04 johnjaylward

I found as well that the test emptyLinesToJSONArray doesn't check anything because last row in str is omitted so it ends up being a case of a CSV with only header row = empty JSON:

    /**
     * Create a JSONArray from string containing only whitespace and commas
     */
    @Test
    public void emptyLinesToJSONArray() {
        String str = " , , , \n , , , ";
        JSONArray jsonArray = CDL.toJSONArray(str);
        assertNull("JSONArray should be null for no content",
                jsonArray);
    }

The correct test should have at least 3 rows and yes it's failing so this functionality was never implemented.

I think I need some access rights to create a branch:

ERROR: Permission to stleary/JSON-java.git denied to jjanczur. Could not read from remote repository. Please make sure you have the correct access rights and the repository exists

Quick fix was to add additional check if the ja is not nul:

            if ((value == null && ja.length() == 0) ||
                    (ja.length() == 0 &&
                            value != null &&
                            value.length() == 0 &&
                            c != ',')) {

instead of:

   if (value == null ||
                    (ja.length() == 0 && value.length() == 0 && c != ',')) {

jjanczur avatar Apr 28 '22 18:04 jjanczur

I found as well that the test emptyLinesToJSONArray doesn't check anything because last row in str is omitted so it ends up being a case of a CSV with only header row = empty JSON

This is actually validating the case that you initially brought up in this ticket. That a "row" with no \n at the end doesn't get processed. The test is currently correct as expected behaviour of the CDL class.

I think I need some access rights to create a branch:

ERROR: Permission to stleary/JSON-java.git denied to jjanczur. Could not read from remote repository. Please make sure you have the correct access rights and the repository exists

The standard model in GitHub is to create a Fork image of a project, make changes in your own branch(es), then submit a PR from your fork.

johnjaylward avatar Apr 28 '22 21:04 johnjaylward

Closing due to lack of activity. If you think it should remain open, please post here.

stleary avatar Sep 30 '23 04:09 stleary