helix icon indicating copy to clipboard operation
helix copied to clipboard

Add support for digraphs

Open Lindenk opened this issue 2 years ago • 8 comments

Closes #1438

The backend is implemented as a trie and finds suggestions using a simple breadth first search starting at the node given by the user's input. I chose the default keybind of Ctrl-K because while vim uses Ctrl-k, in helix that's currently bound to kill_to_line_end. This implementation is to set up initial, usable support for digraphs and can later be improved with:

  • Better suggestions (currently Prompt doesn't allow for extra text in suggestions, so that would need to exist first)
  • Fuzzy find on both input sequences and descriptions
  • A sane, agreed upon default set of digraphs (currently only user configured symbols are supported)
  • Configurable auto-input once a digraph is matched, and there are no more options the input could represent
  • replace, append, and insert variants for binding to other keys in other modes as needed

https://user-images.githubusercontent.com/2438655/174872719-630eaa61-118e-41e1-a27c-a336c87a8514.mp4

Lindenk avatar Jun 21 '22 19:06 Lindenk

Is digraph the right name for this feature? I guess the name comes from using, for example, ctrl-K and "ae" to get the symbol æ, which is sort of a digraph. But if you use this to write Hiragana, I'd say it's more of a general character input method.

tmke8 avatar Jun 27 '22 11:06 tmke8

I just copied nvim on this one, and they also use it for Hiragana. There's also plugins like better digraphs which turns it into a general character lookup.

I would assume most vim users would look for "digraph" as the name for this feature even though it's not a very accurate description.

If it should be changed, what name should should we go with?

Lindenk avatar Jun 27 '22 12:06 Lindenk

What is your use case? If it's just japanese characters a better way is to use japanese or use a IME to input those kanji.

Also, for something like shrug, abbr in vim is probably a better choice compared to this.

pickfire avatar Jun 27 '22 15:06 pickfire

The goal is to provide a customizable input tool similar to vim's digraphs, not specifically for japanese characters. The video is an example based on digraphs available in vim, not an exhaustive list (I didn't feel a multi-hour video going through every option was necessary). Common use cases are usually mathmatics or computer science related symbols such as TH -> þ or the custom example given by the author of #1438 *| -> λ

I would argue abbv is not the same feature as this. Automatic implicit text replacement can be unwanted in many situations, while this character input command is explicit. For example, I wouldn't want ¯\_(ツ)_/¯ to appear every time I type shrug in a sentence

The reason I structured it as a general input tool and not exactly like vim's digraph is for flexability and customizability. It can be used to implement exactly the same behavior as vim's digraph (with a few of the features I bullet pointed above) while also allowing improvements such as fuzzy find, and symbol names and replacement of 1 or more characters

Lindenk avatar Jun 27 '22 17:06 Lindenk

We get occasional requests for this for one of the programming languages that uses mathematical symbols instead of ASCII approximations.

I wouldn't be opposed, but perhaps this is a subset of snippet support (#395)? Could we merge this now and grow it to support snippets later when we get marks, or else would it be better to wait for an all at once implementation and ensure this is in there as a special case?

EpocSquadron avatar Jun 28 '22 02:06 EpocSquadron

But it seemed weird given that it can accept more than 2 characters, digraph is supposed to accept two characters.

pickfire avatar Jul 01 '22 00:07 pickfire

I can rename it to something else, maybe just unicode input? I figured most people looking for this feature would search for diagraph though

Lindenk avatar Jul 01 '22 06:07 Lindenk

But it seemed weird given that it can accept more than 2 characters, digraph is supposed to accept two characters.

That's why I say it seems like a good base for snippets, especially if it grows the ability to add custom shortcut triggers. Add on top of that marks and we have snippets.

EpocSquadron avatar Jul 01 '22 10:07 EpocSquadron

Any interest in finishing this? It's an essential feature for non-english speakers.

velllu avatar Jun 07 '23 14:06 velllu

I really like this patch. It's easy and fast to use, and also adds support for the handful of Unicode-heavy programming languages - particularly proof assistants.

I might suggest supporting generic Unicode input via hex literals as a fallback and reading from a different file than config.toml. It might also be good to bundle default digraphs with Helix, I've been going through and making a personal file and it's turned out to look just like every other list of commonly used Unicode characters out there.

omentic avatar Jul 16 '23 18:07 omentic

I put together a list of some digraphs for personal use: mostly mathematics and linguistics focused. For anyone running this patch, feel free to use this as a starting point and tweak as desired.

expand me
[editor.digraphs]
## Lowercase Greek
alpha = "α"
beta = "β"
gamma = "γ"
delta = "δ"
epsilon = "ε"
zeta = "ζ"
eta = "η"
theta = "θ"
iota = "ι"
kappa = "κ"
lambda = "λ"
mu = "μ"
nu = "ν"
xi = "ξ"
omicron = "ο"
pi = "π"
rho = "ρ"
sigma = "σ"
tau = "τ"
upsilon = "υ"
phi = "φ"
chi = "χ"
psi = "ψ"
omega = "ω"

## Alternate Greek
varbeta = "ϐ"
vargamma = "ɣ"
varepsilon = "ϵ"
vartheta = "ϑ"
varkappa = "ϰ"
varpi = "ϖ"
varrho = "ϱ"
varsigma = "ς"
varphi = "ɸ"

## Uppercase Greek
Alpha = "Α"
Beta = "Β"
Gamma = "Γ"
Delta = "Δ"
Epsilon = "Ε"
Zeta = "Ζ"
Eta = "Η"
Theta = "Θ"
Iota = "Ι"
Kappa = "Κ"
Lambda = "Λ"
Mu = "Μ"
Nu = "Ν"
Xi = "Ξ"
Omicron = "Ο"
Pi = "Π"
Rho = "Ρ"
Sigma = "Σ"
Tau = "Τ"
Upsilon = "Υ"
Phi = "Φ"
Chi = "Χ"
Psi = "Ψ"
Omega = "Ω"

## Double-struck / Blackboard bold
AA = "𝔸"
BB = "𝔹"
CC = "ℂ"
DD = "𝔻"
EE = "𝔼"
FF = "𝔽"
GG = "𝔾"
HH = "ℍ"
II = "𝕀"
JJ = "𝕁"
KK = "𝕂"
LL = "𝕃"
MM = "𝕄"
NN = "ℕ"
OO = "𝕆"
PP = "ℙ"
QQ = "ℚ"
RR = "ℝ"
SS = "𝕊"
TT = "𝕋"
UU = "𝕌"
VV = "𝕍"
WW = "𝕎"
XX = "𝕏"
YY = "𝕐"
ZZ = "ℤ"

## Small caps
sa = "ᴀ"
sb = "ʙ"
sc = "ᴄ"
sd = "ᴅ"
se = "ᴇ"
sf = "ꜰ"
sg = "ɢ"
sh = "ʜ"
si = "ɪ"
sj = "ᴊ"
sk = "ᴋ"
sl = "ʟ"
sm = "ᴍ"
sn = "ɴ"
so = "ᴏ"
sp = "ᴘ"
sq = "ꞯ"
sr = "ʀ"
ss = "ꜱ"
st = "ᴛ"
su = "ᴜ"
sv = "ᴠ"
sw = "ᴡ"
sx = "x"
sy = "ʏ"
sz = "ᴢ"

## Hebrew letters
alef = "א"
bet = "ב"
gimel = "ג"
shin = "ש"

## Extra letters
ell = "ℓ"
angstrom = "Å"
degree = "°"
celcius = "℃"
fahrenheit = "℉"
kelvin = "K"
Re = "ℜ"
Im = "ℑ"
section = "§"
refmark = "※"

## Mathematics
forall = "∀"
exists = "∃"
notexists = "∄"
therefore = "∴"
because = "∵"
sum = "∑"
product = "∏"
coproduct = "∐"
qed = "∎"
top = "⊤"
bot = "⊥"
tee = "⊢"
yields = "⊢"
inf = "∞"
wreath = "≀"
compose = "∘"
convolve = "∗"
multimap = "⊸"
pm = "±"
mp = "∓"
plus = "+"
minus = "-"
times = "×"
div = "÷"
divides = "∣"
notdivides = "∤"
parallel = "∥"
perp = "⟂"
notparallel = "∦"
ident = "≡"
notident = "≢"
sident = "≣"
prop = "∝"
join = "⨝"
smash = "⨳"

## Calculus
diff = "∂"
nabla = "∇"
laplace = "∆"
int = "∫"
iint = "∬"
iiint = "∭"
iiiint = "⨌"
sumint = "⨋"
closedint = "∮"
surfint = "∯"
volint = "∰"

## Logic
not = "¬"
and = "∧"
or = "∨"
xor = "⊕"
in = "∈"
notin = "∉"
ni = "∋"
notni = "∌"
sub = "⊂"
sube = "⊆"
notsub = "⊄"
notsube = "⊈"
sup = "⊃"
supe = "⊇"
notsup = "⊅"
notsupe = "⊉"
union = "∪"
sect = "∩"
without = "∖"
emptyset = "∅"
null = "∅"
to = "→"
gets = "←"
implies = "⇒"
implied = "⇐"
iff = "⟺"
models = "⊧"

## Relations
ratio = "∶"
eq = "="
gt = ">"
lt = "<"
geq = "≥"
leq = "≤"
prec = "≺"
succ = "≻"

## Punctuation
amp = "&"
pma = "⅋"
pil = "¶"
lip = "⁋"
# at = "@"
# hash = "#"
# colon = ":"
# comma = ","
# period = "."
# semicolon = ";"
# slash = "/"
# backslash = "\\"
# exclamation = "!"
bullet = "•"
ast = "∗"
kleene = "∗"
dagger = "†"
ddagger = "‡"
interrobang = "‽"

## Ligatures
ae = "æ"
AE = "Æ"
oe = "œ"
varoe = "ɶ"
OE = "Œ"
lezh = "ɮ"
dezh = "d͡ʒ"

## Linguistics
ash = "æ"
Ash = "Æ"
ethel = "œ"
Ethel = "Œ"
emg = "ɱ"
Emg = "Ɱ"
eng = "ŋ"
Eng = "Ŋ"
esh = "ʃ"
Esh = "Ʃ"
eth = "ð"
Eth = "Ð"
ezh = "ʒ"
Ezh = "Ʒ"
schwa = "ə"
tap = "ɾ"
vtap = "ⱱ"
stop = "ʔ"
ramhorns = "ɤ"
bullseye = "ʘ"
tm = "ɯ"
TM = "Ɯ"
ty = "ʎ"
tr = "ɹ"
tsr = "ʁ"
bl = "ɬ"
nlh = "ɲ"
nrh = "ɳ"
vh = "ʋ"
bh = "ɓ"
BH = "Ɓ"
dh = "ɗ"
DH = "Ɗ"
gh = "ɠ"
GH = "Ɠ"
rt = "ʈ"
rd = "ɖ"
# ɽɟʂʐçʝħʕɦɻɰɭ
# ǀǃǂǁʄ

## Old English
thorn = "þ"
Thorn = "Þ"
wynn = "ƿ"
Wynn = "Ƿ"

## Assorted
spade = "♠"
heart = "♥"
club = "♣"
diamond = "♦"
maltese = "✠"
bitcoin = "₿"
dollar = "$"
euro = "€"
franc = "₣"
lira = "₺"
peso = "₱"
pound = "£"
ruble = "₽"
rupee = "₹"
won = "₩"
yen = "¥"

omentic avatar Jul 17 '23 18:07 omentic

After working with this for a while, I've found it exceedingly helpful. These bindings work well for me: and I would suggest having space function the same as enter for ease of use.

[keys.insert]
"\\" = "insert_digraph"

[editor.digraphs]
"\\" = "\\"
...

omentic avatar Aug 06 '23 22:08 omentic

Sorry for taking a while to review. I'm confused on why you opted for a trie instead of a vector. I would probably implement this feature as a Vec<DigraphEntry> and implement completion through helix_core::fuzzy::fuzzy_match like how it's done for https://github.com/helix-editor/helix/blob/4dbdcaebba41dbabc3e39ddc411d60095de2585f/helix-term/src/ui/mod.rs#L380

kirawi avatar Sep 05 '23 02:09 kirawi

I don't think I knew there was already a fuzzy match implementation (or if there was one a year ago). It should be pretty easy to swap over to using it instead of a handbuilt trie if that's preferable

Lindenk avatar Sep 12 '23 18:09 Lindenk

So I brought this up on Matrix, and @pascalkuthe said,

In particular I think this should be handled by the same infrastructure that will Handel custom snippets (and abbreviations). I think just having an abbreviation which has a special flag that makes it a diagraph that would not make it showup automatically but only once you press a certain key would be enough (it would just use fuzzy filtering/the normal completion windoe but autoconfirm once there only is a single match)

kirawi avatar Sep 12 '23 21:09 kirawi