librime [Feature Request] 以词定字

提起鼠须管，很多人都会说’词库小‘，然后导入外部的词库，可是这样也引入了更多的重码。打不出词组的时候苦恼，打出太多词组也是负担。即使这样，也没有改善单字重码的问题，反而更加剧了。倒不如一个基本的词库，加上’以词定字‘功能。

May 29 '13 07:05 twlz0ne

我也希望加入以词定字的功能.

Dec 15 '17 16:12 rsyqvthv

所以这个功能现在有了吗？

Oct 25 '18 02:10 BettyJJ

所以这个功能现在有了吗？

Jul 08 '19 08:07 zhoulianglen

配合 lua 扩展可以实现 https://github.com/BlindingDark/rime-lua-select-character

May 23 '20 14:05 BlindingDark

想到了一種新的實現方案以完成以詞定字的要求。起因是想到當需要依賴於多字詞確定中間某字時，依靠通常的思路似乎是無法實現的。比如，想要依靠「魑魅魍魎」一詞定出「魍」（實際上可以依靠「魍魎」一詞來定，此處僅僅用於舉例）。於是，我想到使用Lua腳本的過濾器（lua_filter）來實現我所想要的功能。下面這張是實現後的效果：雖然所用的輸入方案是在下依靠「地球拼音」及「自然碼雙拼」拼湊出的「自然碼地球雙拼」，不過原理相通，想必也可用於其他輸入法。在配置腳本之前，首先確定一個用於響應「以詞定字」的輸入碼，並將其列入輸入方案的字母表中。

speller:
  alphabet: 'zyxwvutsrqponmlkjihgfedcba-;/<,>.\?'

此處選擇的是問號「?」。之後在字典中加入一箇對應於該輸入碼的條目：

※	?

確保輸入碼與輸出碼唯一。準備完畢後可以配置腳本，核心思路即判斷候選中是否存在該特別輸出（此處爲「※」），存在則將候選分割，並調整註釋：

local function filter(input)
    for cand in input:iter() do
        local str = cand.text
        if utf8sub(str, -1) == "※" then
            for i = 1, utf8.len(str), 1 do
                if utf8sub(str, i, 1) ~= "※" then
                    yield(Candidate("w2c", cand.start, cand._end, utf8sub(str, i, 1), utf8sub(str, 1, -1)))
                end
            end
        else
            yield(cand)
        end
    end
end

return filter

其中用到用於截取UTF8字符串的函數：

-- 判断utf8字符byte长度
-- 0xxxxxxx - 1 byte
-- 110yxxxx - 192, 2 byte
-- 1110yyyy - 225, 3 byte
-- 11110zzz - 240, 4 byte
local function chsize(char)
    if not char then
        print("not char")
        return 0
    elseif char > 240 then
        return 4
    elseif char > 225 then
        return 3
    elseif char > 192 then
        return 2
    else
        return 1
    end
end

-- 截取utf8 字符串
-- str:            要截取的字符串
-- startChar:    开始字符下标,从1开始
-- numChars:    要截取的字符长度
function utf8sub(str, startChar, numChars)

    local l = utf8.len(str)

    if startChar < 0 then
        startChar = l + 1 + startChar
    end

    if numChars == nil then
        numChars = l - startChar + 1
    elseif numChars < 0 then
        numChars = l - startChar + 1 + numChars
    end

    local startIndex = 1
    while startChar > 1 do
        local char = string.byte(str, startIndex)
        startIndex = startIndex + chsize(char)
        startChar = startChar - 1
    end

    local currentIndex = startIndex

    while numChars > 0 and currentIndex <= #str do
        local char = string.byte(str, currentIndex)
        currentIndex = currentIndex + chsize(char)
        numChars = numChars -1
    end
    return str:sub(startIndex, currentIndex - 1)
end

Jan 22 '22 12:01 Reniastyc

按照剛纔的思路，對代碼稍加修改，甚至可以方便應對需要定位的詞不在首位的情形：

local function filter(input)
    local valid = false
    for cand in input:iter() do
        local str = cand.text
        if utf8sub(str, -1) == "※" then
            str = utf8sub(str, 1, -1)
            valid = true
        end
        if valid and utf8.len(str) > 1 then
            for i = 1, utf8.len(str), 1 do
                yield(Candidate("w2c", cand.start, cand._end, utf8sub(str, i, 1), str))
            end
        else
            yield(cand)
        end
    end
end

return filter

Jan 22 '22 14:01 Reniastyc

不錯啊，這lua寫得6

Jan 22 '22 14:01 LEOYoon-Tsaw

上面的輸入方案已經上傳到：https://github.com/Reniastyc/rime-double-terra

附有簡要說明和安裝方案。

Jan 22 '22 15:01 Reniastyc

思路很好。實現方法可以改進。

Jan 26 '22 04:01 lotem

按照剛纔的思路，對代碼稍加修改，甚至可以方便應對需要定位的詞不在首位的情形：

local function filter(input)
    local valid = false
    for cand in input:iter() do
        local str = cand.text
        if utf8sub(str, -1) == "※" then
            str = utf8sub(str, 1, -1)
            valid = true
        end
        if valid and utf8.len(str) > 1 then
            for i = 1, utf8.len(str), 1 do
                yield(Candidate("w2c", cand.start, cand._end, utf8sub(str, i, 1), str))
            end
        else
            yield(cand)
        end
    end
end

return filter

有个问题，选择第二个侯选词的字时会多出一个"?"

Jun 21 '23 07:06 naosense

按照剛纔的思路，對代碼稍加修改，甚至可以方便應對需要定位的詞不在首位的情形：

local function filter(input)
    local valid = false
    for cand in input:iter() do
        local str = cand.text
        if utf8sub(str, -1) == "※" then
            str = utf8sub(str, 1, -1)
            valid = true
        end
        if valid and utf8.len(str) > 1 then
            for i = 1, utf8.len(str), 1 do
                yield(Candidate("w2c", cand.start, cand._end, utf8sub(str, i, 1), str))
            end
        else
            yield(cand)
        end
    end
end

return filter

有个问题，选择第二个侯选词的字时会多出一个"?"

还有一个问题，这样产生的词没有存入userdb

Jun 22 '23 09:06 naosense

librime librime copied to clipboard

[Feature Request] 以词定字

librime
librime copied to clipboard