Support multiline fields with arbitrary separators
Currently, multiline fields are only supported with commas and semicolons, see #20 for context.
Hello @mechatroner, I've started looking into this issue and have two questions.
1 - As you've mentioned, we can enable arbitrary separators by removing this check: https://github.com/mechatroner/sublime_rainbow_csv/blob/e2fa752e4bae3a16e226f038744f937e57b6834f/main.py#L464
However, from what I could tell, the policy will usually be defaulted to simple by the following lines:
https://github.com/mechatroner/sublime_rainbow_csv/blob/e2fa752e4bae3a16e226f038744f937e57b6834f/main.py#L459-L463
So unless the user takes explicit action to enable the quoted policy together with their non-standard separator (either by using the enable_quoted sublime-command, or through the configuration), the feature will still not appear to work as expected.
Does this analysis seem correct to you? If yes, can we consider removing this check as well (and therefore always defaulting auto to quoted), to change the behaviour to be more intuitive for users?
2 - I've seen that this same check on the standard separators [';', ','] also appears here (and as a side note, also in several locations in the RBQL dependency, but I have not delved into that):
https://github.com/mechatroner/sublime_rainbow_csv/blob/e2fa752e4bae3a16e226f038744f937e57b6834f/main.py#L61-L62
I'm not so familiar with python and couldn't easily determine the impact of leaving this check here while removing it from other locations. Would you be able to indicate whether this part of the code must be aligned as well in order to more cleanly resolve this issue?
Thanks!
Your analysis is correct! I think that is the only place where changes should be made.
can we consider removing this check as well (and therefore always defaulting auto to quoted), to change the behaviour to be more intuitive for users?
I am not sure if this would be more intuitive, for example, tab-separated dialects are doublequote-agnostic in 100% of cases and I always thought that pipe-separated files also mostly use "simple" dialect (e.g. use case here: https://github.com/mechatroner/vscode_rainbow_csv/issues/1#issuecomment-392231646 ). I admit that I don't have any statistics about pipe-separated file usage in different domains but I would prefer not to change the default behavior without strong supportive evidence. My impression was that choosing | as a separator allows to avoid double quotes because unlike , and ; characters it is much less likely to see it in the actual text, so quoting is not needed, and the advantage over tab is at it won't get lost/replaced with spaces during copy/paste operations.
We can add another setting though - like a list of separators that should use "quoted" dialect by default, or even a boolean setting like "quote all separators by default" as you suggest and handle it the main.py.
Hi @mechatroner, sorry for the late response.
I think that is the only place where changes should be made.
Thanks for confirming this 👍
choosing | as a separator allows to avoid double quotes because […] it is much less likely to see it in the actual text, so quoting is not needed
My use case was actually not about escaping | in the text, but rather enabling the handling of line breaks in the text. So I would have needed quoted fields regardless of what separator was in use.
I would prefer not to change the default behavior without strong supportive evidence
But if nobody else has complained, then you're right, there's no need to modify the behaviour for an edge case such as mine. I don't think there's a real need to add any new configuration options, since the existing commands (after this fix) can be used to make this work. Maybe all that's needed is to make sure that the documentation helps users to achieve this if they need?
With this I'm all set to work on getting you a pull request, then :) Thanks for your feedback so far!