ud-annotatrix
ud-annotatrix copied to clipboard
Bring back space-to-tab conversion (for pasted input)
When copy/pasting from vim in a terminal, spaces are copied instead of tabs. Previously we had a heuristic method of dealing with this, it replaced 2+ spaces with a tab. This worked 95% of the time and it would be great to get back.
When copy/pasting from vim in a terminal
This is about 40% of the reason I use gvim.
It also applies to copying from a bad pastebin that replaces tabs with spaces.
It also applies to copying from a bad pastebin that replaces tabs with spaces.
Yeah, we can do some sort of space-to-tab conversion through heuristics—and I do remember this working—but really everyone should be using tabs. Any software that converts tabs to spaces without asking you should be avoided.
Hm, there is a multiple-space-to-tab conversion function built into notatrix
, so I would expect this to work. I'm not immediately sure why it wouldn't.
Also, when I copy/paste from Vim, it preserves my <tab>
characters, so I'm not sure I'll be able to reproduce this. I definitely agree that it should be a supported feature.
At the very least, we could do our own multiple-space-to-tab conversions before passing the input along to notatrix
, just to make sure that it's behaving how we expect.
Hm, there is a multiple-space-to-tab conversion function built into
notatrix
, so I would expect this to work. I'm not immediately sure why it wouldn't.
Could you clarify how notatrix
is used? E.g., if one clones ud-annotatrix
and just serves the code (or if one hosts on github), how is notatrix
leveraged? Is it a dependency that lives somewhere in the ud-annotatrix
repo too? If so, might it need to be updated?
Hm, there is a multiple-space-to-tab conversion function built into
notatrix
, so I would expect this to work. I'm not immediately sure why it wouldn't.Also, when I copy/paste from Vim, it preserves my
<tab>
characters, so I'm not sure I'll be able to reproduce this. I definitely agree that it should be a supported feature.At the very least, we could do our own multiple-space-to-tab conversions before passing the input along to
notatrix
, just to make sure that it's behaving how we expect.
Try a different terminal. Most terminals suck at this, probably by design. I just tested xfce4-terminal
, terminator
, mlterm
, and konsole
(all of which were already on my laptop), and they all copied spaces from a tab in vim, both in select/middle-click copies and regular copy/paste copies (i.e., both standard copy-paste buffers had this issue). Pastes were tested into Firefox, but ime anywhere else is also a problem, especially back into vim :-P
Hm, there is a multiple-space-to-tab conversion function built into
notatrix
, so I would expect this to work. I'm not immediately sure why it wouldn't.Could you clarify how
notatrix
is used? E.g., if one clonesud-annotatrix
and just serves the code (or if one hosts on github), how isnotatrix
leveraged? Is it a dependency that lives somewhere in theud-annotatrix
repo too? If so, might it need to be updated?
Replied to this question in https://github.com/jonorthwash/ud-annotatrix/issues/397#issuecomment-636520902. If you cloned the repo and are hosting locally, you may need to refresh dependencies (via npm install --save-dev
).
You can try: https://dpaste.com/H6X6ABMUC In: https://ftyers.github.io/ud-annotatrix/standalone/annotator.html and in: https://jonorthwash.github.io/ud-annotatrix/
I'm having trouble reproducing this issue.
You can try: https://dpaste.com/H6X6ABMUC
This dpaste is no longer available. Could you paste something where you're encountering this into this issue?
https://dpaste.com/6HYV8MFMP
https://dpaste.com/6HYV8MFMP
Is this even valid CoNLL-U? It has only one space in yehuatl PRON
. If you add another space there it works fine.
Regardless of validity, the algorithm (and what you stated the issue was) is there there have to be at least two spaces between each token. Otherwise, how would it know it's not meant to be a single column?
It isn't valid conllu because it doesn't have tabs
I think a heuristic can be made in the case of single spaces
Only some of the columns can have spaces in, E.g. form/lemma/misc
So the fact that its the third column and the following column contains a UPOS tag is pretty good evidence
So a stupidish algorithm that would get this case and some others could be:
if spaces in line:
line.split(\s{2,}) # current behaviour
if count(columns) not correct: # new
for column in certainColumns:
if \s+ in column and intersection(set(certainTags), set(column.split())) > 0:
column.split()
if count(columns) not correct:
sentence = invalid # current bahaviour?