Don't create as many garbage objects during reads

Open ash211 opened this issue 12 years ago • 1 comments

The old method of reading a row by creating a new reader for every row is quite inefficient because it leaves a lot of objects laying around for the garbage collector. This change creates a CSVParser once and then re-uses it for every row.

There's a chance this change causes a regression, because CSVSerde is no longer serializable (CSVParser isn't serializable) but this doesn't seem to be an issue for me in my testing.

Dec 06 '13 08:12 ash211

In my preliminary testing, this caused a simple count(*) job of 120-130 sec to go down to 100-110.

Dec 06 '13 08:12 ash211