ud-annotatrix icon indicating copy to clipboard operation
ud-annotatrix copied to clipboard

Bring back space-to-tab conversion (for pasted input)

Open ftyers opened this issue 4 years ago • 13 comments

When copy/pasting from vim in a terminal, spaces are copied instead of tabs. Previously we had a heuristic method of dealing with this, it replaced 2+ spaces with a tab. This worked 95% of the time and it would be great to get back.

ftyers avatar May 06 '20 13:05 ftyers

When copy/pasting from vim in a terminal

This is about 40% of the reason I use gvim.

jonorthwash avatar May 06 '20 17:05 jonorthwash

It also applies to copying from a bad pastebin that replaces tabs with spaces.

ftyers avatar May 07 '20 10:05 ftyers

It also applies to copying from a bad pastebin that replaces tabs with spaces.

Yeah, we can do some sort of space-to-tab conversion through heuristics—and I do remember this working—but really everyone should be using tabs. Any software that converts tabs to spaces without asking you should be avoided.

jonorthwash avatar May 07 '20 21:05 jonorthwash

Hm, there is a multiple-space-to-tab conversion function built into notatrix, so I would expect this to work. I'm not immediately sure why it wouldn't.

Also, when I copy/paste from Vim, it preserves my <tab> characters, so I'm not sure I'll be able to reproduce this. I definitely agree that it should be a supported feature.

At the very least, we could do our own multiple-space-to-tab conversions before passing the input along to notatrix, just to make sure that it's behaving how we expect.

keggsmurph21 avatar May 11 '20 05:05 keggsmurph21

Hm, there is a multiple-space-to-tab conversion function built into notatrix, so I would expect this to work. I'm not immediately sure why it wouldn't.

Could you clarify how notatrix is used? E.g., if one clones ud-annotatrix and just serves the code (or if one hosts on github), how is notatrix leveraged? Is it a dependency that lives somewhere in the ud-annotatrix repo too? If so, might it need to be updated?

jonorthwash avatar May 15 '20 21:05 jonorthwash

Hm, there is a multiple-space-to-tab conversion function built into notatrix, so I would expect this to work. I'm not immediately sure why it wouldn't.

Also, when I copy/paste from Vim, it preserves my <tab> characters, so I'm not sure I'll be able to reproduce this. I definitely agree that it should be a supported feature.

At the very least, we could do our own multiple-space-to-tab conversions before passing the input along to notatrix, just to make sure that it's behaving how we expect.

Try a different terminal. Most terminals suck at this, probably by design. I just tested xfce4-terminal, terminator, mlterm, and konsole (all of which were already on my laptop), and they all copied spaces from a tab in vim, both in select/middle-click copies and regular copy/paste copies (i.e., both standard copy-paste buffers had this issue). Pastes were tested into Firefox, but ime anywhere else is also a problem, especially back into vim :-P

jonorthwash avatar May 15 '20 21:05 jonorthwash

Hm, there is a multiple-space-to-tab conversion function built into notatrix, so I would expect this to work. I'm not immediately sure why it wouldn't.

Could you clarify how notatrix is used? E.g., if one clones ud-annotatrix and just serves the code (or if one hosts on github), how is notatrix leveraged? Is it a dependency that lives somewhere in the ud-annotatrix repo too? If so, might it need to be updated?

Replied to this question in https://github.com/jonorthwash/ud-annotatrix/issues/397#issuecomment-636520902. If you cloned the repo and are hosting locally, you may need to refresh dependencies (via npm install --save-dev).

keggsmurph21 avatar May 31 '20 20:05 keggsmurph21

You can try: https://dpaste.com/H6X6ABMUC In: https://ftyers.github.io/ud-annotatrix/standalone/annotator.html and in: https://jonorthwash.github.io/ud-annotatrix/

ftyers avatar Nov 17 '20 23:11 ftyers

I'm having trouble reproducing this issue.

You can try: https://dpaste.com/H6X6ABMUC

This dpaste is no longer available. Could you paste something where you're encountering this into this issue?

jonorthwash avatar Jun 10 '22 16:06 jonorthwash

https://dpaste.com/6HYV8MFMP

ftyers avatar Jun 10 '22 16:06 ftyers

https://dpaste.com/6HYV8MFMP

Is this even valid CoNLL-U? It has only one space in yehuatl PRON. If you add another space there it works fine.

Regardless of validity, the algorithm (and what you stated the issue was) is there there have to be at least two spaces between each token. Otherwise, how would it know it's not meant to be a single column?

jonorthwash avatar Jun 14 '22 23:06 jonorthwash

It isn't valid conllu because it doesn't have tabs
I think a heuristic can be made in the case of single spaces 
Only some of the columns can have spaces in, E.g. form/lemma/misc 
So the fact that its the third column and the following column contains a UPOS tag is pretty good evidence 

ftyers avatar Jun 17 '22 16:06 ftyers

So a stupidish algorithm that would get this case and some others could be:

   if spaces in line:
      line.split(\s{2,})              # current behaviour
      if count(columns) not correct:  # new
         for column in certainColumns:
            if \s+ in column and intersection(set(certainTags), set(column.split())) > 0:
               column.split()
   if count(columns) not correct:
      sentence = invalid              # current bahaviour?

jonorthwash avatar Jun 18 '22 15:06 jonorthwash