commons-csv icon indicating copy to clipboard operation
commons-csv copied to clipboard

CSV-285: Replace BufferedReader with PushbackReader

Open belugabehr opened this issue 4 years ago • 6 comments

https://issues.apache.org/jira/browse/CSV-285

belugabehr avatar Jul 16 '21 03:07 belugabehr

Hold off on this one for right now, the JMH tests are not as favorable as I was expecting. I need to review the tests.

belugabehr avatar Jul 16 '21 03:07 belugabehr

Coverage Status

Coverage decreased (-0.3%) to 97.99% when pulling 532a3f10eea650e252d32737ec560e0287114606 on belugabehr:CSV-285 into 27843d8dc0ec6e5af910674bfb4cba4e48b2475b on apache:master.

coveralls avatar Jul 16 '21 04:07 coveralls

No rush. I might cut a release candidate next week or so but this is a major change we might only want to do after that.

garydgregory avatar Jul 16 '21 11:07 garydgregory

Ya, all the unit tests pass, but I definitely see some flaws in my attempt here. Do not commit.

belugabehr avatar Jul 16 '21 14:07 belugabehr

@garydgregory

Weird timing that I just happened to pick up CSV commons work the other day. I thought I was making big progress on performance, but my research just revealed that there was a big performance regression as part of adding support for String delimiters #76 which was just added to master recently. My contributions this week made some incremental improvements, but really the big performance improvement I addressed in #167 which bypasses the work in #76 by short-circuiting for the common use case of a single delimiter character. With #167 in place, the performance is much more similar to the current 1.8 version and not nearly as impressive as I had thought.

belugabehr avatar Jul 16 '21 14:07 belugabehr

Hi @belugabehr, Please note that I merged a bug fix today related to string delimiters.

garydgregory avatar Feb 19 '22 17:02 garydgregory

Hi @belugabehr Are you still working on this PR or should it be closed?

garydgregory avatar Oct 15 '22 20:10 garydgregory

I'm going to close this.

I think a push-back buffer is cleaner as it's really well suited for this use case, but the current implementation is faster (and not all that hard to follow).

belugabehr avatar Oct 15 '22 22:10 belugabehr