Track IFS with file-format flags at CLI parse (was: file format flags are not overriding mlrrc)
I am using miller 6.8.0 on OSX. Here is my mlrrc:
pprint
When I run the following command, I get the following error:
❯ mlr --csv cat << EOF
a,b c
1,2
3,4
EOF
mlr: mlr: CSV header/data length mismatch 2 != 1 at filename (stdin) row 2.
As you can see, miller is defaulting to pprint even though I passed "--csv". It is interpreting spaces as the delimiter, not the commas.
Next, I change my mlrrc to:
#pprint
Now I can run the previous command with no issue:
❯ mlr --csv cat << EOF
a,b c
1,2
3,4
EOF
a,b c
1,2
3,4
According to the documentation, miller is supposed to override any file formats in the mlrrc when options are passed in the terminal, but this is not happening.
If I run
mlr --csv cat << EOF
a,b c
1,2
3,4
EOF
I have
a,b c
1,2
3,4
But I'm using mlr 6.7.0
What is your mlrrc?
And I have the same using 6.8.
What is your mlrrc?
I do not use it
I do not use it
Ok, then that explains why you don't have the same error. Now try setting your mlrrc to match mine.
Ok, then that explains why you don't have the same error. Now try setting your mlrrc to match mine.
the same error
Hi @holmescharles and thanks for the issue!
I looked into this and it's not quite what any of us thought.
And unfortunately I think you may not like the answer! :) And neither do I ... 😉
When pprint is in the .mlrrc file and --csv is on the command line, the input-file format is correctly found to be PPRINT from the .mlrrc, and then is correctly changed to be CSV. That isn't broken.
But what happens is --pprint, when first encountered, sets IFS from not-specified-yet, to one or more spaces. Then when --csv is encountered, the IFS is already set, and so doesn't get changed to be comma.
This isn't a .mlrrc issue as it first appeared -- rather, you get the same result with
$ mlr --pprint --csv cat << EOF
a,b c
1,2
3,4
EOF
mlr: mlr: CSV header/data length mismatch 2 != 1 at filename (stdin) row 2.
So: a fix could be that --csv should always re-write IFS after --pprint has set it.
This is a bigger issue and it goes back all the way to Miller 1 and involves the fact that these flags are overlapping. So --pprint and --csv set the input file format and IFS; and so on. And from the beginning I wanted people to do either
mlr --ifs ';' --csv ...
or
mlr --csv --ifs ';' ...
-- and Miller has always had that flexibility, and I don't want to take it away. I think it would break a lot of things for a lot of people, to change it now.
So ...... I think the best that can be done at present is this suggestion:
mlr --pprint --csv --ifs comma cat << EOF
a,b c
1,2
3,4
EOF
with the explicit --ifs comma.
I wish I had a better answer, but, at this point in time, and given the long-standing usage pattern above, I don't see a better option.
Later thought: I thought there may be a way to accommodate this but I'm still unsure. Namely:
- Currently, if
IFSis not set,--pprintsets input file format to PPRINT and IFS to one or more spaces. - Currently, if
IFSis not set,--csvsets input file format to CSV and IFS to comma. - The flag for if
IFSwas set or not is here: https://github.com/johnkerl/miller/blob/6.8.0/internal/pkg/cli/option_types.go#L48-L54 - What I could do is change it from "if
IFSset" to "isIFSset for the last-specified input format" - This would still need handling for
mlr --ifs ';' --csv ...since there is no last-specified input format when--ifscomes first on the command line ... 🤔
Maybe I'm thinking about this incorrectly, but could you keep track of the give format flag and wait until the end to use that to set things like IFS etc? That way, if the user specifies something else, it waits until the last minute to assign unassigned variables? I'm not sure how things are implemented, but this could just be tracked in some new variable? Then, if you specify --pprint before --csv, it will completely ignore pprint.
Alternatively, pprint is just csv with "contiguous whitespace" as delimiters. For my own curiosity, how would you specify contiguous whitespace as IFS?
Maybe I'm thinking about this incorrectly, but could you keep track of the give format flag and wait until the end to use that to set things like IFS etc?
Yes, you & I are talking about the same thing -- what you just wrote, and my bullet list above.
So --csv --ifs semicolon and --ifs semicolon --csv would do the same, but, if --pprint (or --xtab or whatever) appears after that, it would "reset".
pprint is just csv with "contiguous whitespace" as delimiters
Not quite. --ipprint is the same as --icsvlite --ifs space --repifs. And --opprint is not the same as --o anything else. This is because there's significant code reuse between the csv-lite and pprint readers, whereas the pprint writer has very much its own logic.