tablesaw icon indicating copy to clipboard operation
tablesaw copied to clipboard

Is there any way to skip rows with the wrong type detected

Open shabir1 opened this issue 2 years ago • 2 comments

Hi,

I am trying to read a file with 1 million rows and 7 columns, out of 1 million rows 1 row is bad due to that bad row table saw throws error. A bad row means I have a double column and 1 of its value contains a string value. Error: Error while adding cell from row 4453 and column Column_5(position:5): Error adding value to column Column_5: For input string: "386,1"

Code:


private static CsvReadOptions.Builder getBuilder(InputStream inputStream, ColumnType[] types) {
        return CsvReadOptions
                .builder(inputStream)
                .maxCharsPerColumn(-1)
                .columnTypes(types);
    }

ColumnType[] types = {FLOAT, STRING, FLOAT, FLOAT, FLOAT, FLOAT, FLOAT};
Table table = Table.read().usingOptions(getBuilder(inputStreamIterator.next(), types));

Is there any way to skip rows with the wrong type detected?

shabir1 avatar Jul 25 '22 14:07 shabir1

I know another way to load the table saw table is to autodetect column types but I want to enforce the schema (column types) if any row fails with the given type then it should skip/delete that row.

shabir1 avatar Jul 25 '22 14:07 shabir1

There is an option to skip a row on a column-count error, but it probably should have been made more general. Right now there's no way to do what you want to do.

As a work around, you could load it as a string, delete the row, and convert the StringColumn to a doubleColumn.

Something like:

tableA.replaceColumn(strCol, stringCol.asDoubleColumn().setName(strCol.name());

On Mon, Jul 25, 2022 at 10:45 AM Shabir Ahmad Bhat @.***> wrote:

I know another way to load the table saw table is to autodetect column types but I want to enforce the schema (column types) if any row fails with the given type then it should skip/delete that row.

— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1131#issuecomment-1194145386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAWJINRCFYRKNIQF3PTVV2SAPANCNFSM54SRH6TA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lwhite1 avatar Jul 25 '22 22:07 lwhite1