csv-serde
csv-serde copied to clipboard
Don't create as many garbage objects during reads
The old method of reading a row by creating a new reader for every row is quite inefficient because it leaves a lot of objects laying around for the garbage collector. This change creates a CSVParser once and then re-uses it for every row.
There's a chance this change causes a regression, because CSVSerde is no longer serializable (CSVParser isn't serializable) but this doesn't seem to be an issue for me in my testing.
In my preliminary testing, this caused a simple count(*) job of 120-130 sec to go down to 100-110.