stan icon indicating copy to clipboard operation
stan copied to clipboard

Fix stan_csv_reader and test

Open betanalpha opened this issue 11 years ago • 9 comments

The stan_csv_reader is broken and has been for some time. The internal state is extremely fragile and right now it incorrectly parses the input data file. Any fix would require some significant hacking -- we really should fix this as a consequence of the refactor in #405 and #488, after which we'll have an actual spec for the reader to match.

Unit tests never revealed this to be an issue because the associated test, stan_csv_reader_test, uses static output files for comparison that were deprecated. We now have the makefile-fu to automatically generate these files and avoid this issue in the future.

betanalpha avatar Jan 20 '14 22:01 betanalpha

Michael --- could you start a branch and add a test that fails? Or just include a file that fails here?

I'm pushing this off to 2.4.0+ for now, since it doesn't seem to be critical. If it is, can someone push it back down to 2.4.0 and commit to fixing it?

bob-carpenter avatar Jul 07 '14 21:07 bob-carpenter

Michael --- if you want this fixed, could you be more specific about what's going wrong?

I'm pushing this issue off indefinitely pending some indication of what's broken.

bob-carpenter avatar Sep 22 '14 22:09 bob-carpenter

The problem is that the csv_reader is hardcoded to an older set of configuration options. It needs to automatically sync up with the current config (say, by actually using the config code), but that requires a decision on how the config should be used in the API.

On Sep 22, 2014, at 11:29 PM, Bob Carpenter [email protected] wrote:

Michael --- if you want this fixed, could you be more specific about what's going wrong?

I'm pushing this issue off indefinitely pending some indication of what's broken.

— Reply to this email directly or view it on GitHub.

betanalpha avatar Sep 22 '14 22:09 betanalpha

I'm bumping this issue forward. This code is messy and really should just be rewritten.

@bob-carpenter, do you have any guiding principles for writing a parser for the csvs? This is also causing stan-dev/cmdstan#510.

syclik avatar Nov 30 '16 14:11 syclik

My only strong recommendation is to write a generic CSV parser, then plug it into Stan. Don't mix in all sorts of stuff about reading from comments or particular orders of things in the Stan files.

bob-carpenter avatar Nov 30 '16 20:11 bob-carpenter

That's reasonable, but it also means that this would be hard to resolve without changing out CmdStan output. Maybe we just punt on elegance and have CmdStan start outputting multiple files to isolate a clean CSV for samples?

On Wed, Nov 30, 2016 at 12:13 PM, Bob Carpenter [email protected] wrote:

My only strong recommendation is to write a generic CSV parser, then plug it into Stan. Don't mix in all sorts of stuff about reading from comments or particular orders of things in the Stan files.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/507#issuecomment-263982114, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlvKWynzxvnGPcc4kugW90dak4fzOks5rDdjqgaJpZM4BbKEP .

betanalpha avatar Nov 30 '16 21:11 betanalpha

+1 to the multiple files approach

On Wed, Nov 30, 2016, 4:05 PM Michael Betancourt [email protected] wrote:

That's reasonable, but it also means that this would be hard to resolve without changing out CmdStan output. Maybe we just punt on elegance and have CmdStan start outputting multiple files to isolate a clean CSV for samples?

On Wed, Nov 30, 2016 at 12:13 PM, Bob Carpenter [email protected] wrote:

My only strong recommendation is to write a generic CSV parser, then plug it into Stan. Don't mix in all sorts of stuff about reading from comments or particular orders of things in the Stan files.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/507#issuecomment-263982114, or mute the thread < https://github.com/notifications/unsubscribe-auth/ABdNlvKWynzxvnGPcc4kugW90dak4fzOks5rDdjqgaJpZM4BbKEP

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/507#issuecomment-263994946, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfA6Xm7v8l5qD-6Y37uuL6i6halwe60ks5rDeUFgaJpZM4BbKEP .

sakrejda avatar Nov 30 '16 21:11 sakrejda

Or, split the processing into two steps. One that just pulls the CSV header and draws following ordinary CSV interpretation rules (not exactly a spec, but a strong convention). And then handle other things in a separate pass.

bob-carpenter avatar Nov 30 '16 23:11 bob-carpenter

+1. I think we can implement it that way for now (2 passes).

On Wed, Nov 30, 2016 at 6:27 PM, Bob Carpenter [email protected] wrote:

Or, split the processing into two steps. One that just pulls the CSV header and draws following ordinary CSV interpretation rules (not exactly a spec, but a strong convention). And then handle other things in a separate pass.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/507#issuecomment-264029811, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ_FxSIV8qzqBktWp9ODh4uaAZ7LodYks5rDgZUgaJpZM4BbKEP .

syclik avatar Nov 30 '16 23:11 syclik