tablesaw
tablesaw copied to clipboard
Is there any way to skip rows with the wrong type detected
Hi,
I am trying to read a file with 1 million rows and 7 columns, out of 1 million rows 1 row is bad due to that bad row table saw throws error. A bad row means I have a double column and 1 of its value contains a string value.
Error:
Error while adding cell from row 4453 and column Column_5(position:5): Error adding value to column Column_5: For input string: "386,1"
Code:
private static CsvReadOptions.Builder getBuilder(InputStream inputStream, ColumnType[] types) {
return CsvReadOptions
.builder(inputStream)
.maxCharsPerColumn(-1)
.columnTypes(types);
}
ColumnType[] types = {FLOAT, STRING, FLOAT, FLOAT, FLOAT, FLOAT, FLOAT};
Table table = Table.read().usingOptions(getBuilder(inputStreamIterator.next(), types));
Is there any way to skip rows with the wrong type detected?
I know another way to load the table saw table is to autodetect column types but I want to enforce the schema (column types) if any row fails with the given type then it should skip/delete that row.
There is an option to skip a row on a column-count error, but it probably should have been made more general. Right now there's no way to do what you want to do.
As a work around, you could load it as a string, delete the row, and convert the StringColumn to a doubleColumn.
Something like:
tableA.replaceColumn(strCol, stringCol.asDoubleColumn().setName(strCol.name());
On Mon, Jul 25, 2022 at 10:45 AM Shabir Ahmad Bhat @.***> wrote:
I know another way to load the table saw table is to autodetect column types but I want to enforce the schema (column types) if any row fails with the given type then it should skip/delete that row.
— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1131#issuecomment-1194145386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAWJINRCFYRKNIQF3PTVV2SAPANCNFSM54SRH6TA . You are receiving this because you are subscribed to this thread.Message ID: @.***>