MathTranslate icon indicating copy to clipboard operation
MathTranslate copied to clipboard

do not translate words from a given vocabulary

Open rotcx opened this issue 1 year ago • 5 comments

e.g., do not translate LLM to 法学硕士. Leave it as LLM.

e.g., do not Transformer LLM to 变压器. Leave it as Transformer.

rotcx avatar Nov 16 '23 12:11 rotcx

if we could not set such a non-translating vocab for translators (google, tencent ... )

the only way is to remedy it replace the (wrongly) translated words to the origin EN word after translation ...

rotcx avatar Nov 18 '23 08:11 rotcx

An impl could be:

    from functools import reduce
    replace_dict = {"法学硕士": "LLM", "变压器": "Transformer", "代币":"token"}
    text_final = reduce(lambda text, kv: text.replace(*kv), replace_dict.items(), text_final)

image

rotcx avatar Nov 18 '23 09:11 rotcx

Another (downstream way) is to proc the translated main.tex file:

#!/bin/bash

declare -A replace_dict=(["法学硕士"]="LLM" ["变压器"]="Transformer" ["代币"]="token")

while read -r line; do
    for key in "${!replace_dict[@]}"; do
        line=${line//${key}/${replace_dict[$key]}}
    done
    echo $line
done < main.tex

rotcx avatar Nov 18 '23 09:11 rotcx

iter all .tex files of directory dir and proc (as we could not in general not know which .tex is the main tex file?):

#!/bin/bash

declare -A replace_dict=(["法学硕士"]="LLM" ["变压器"]="Transformer" ["代币"]="token")

find dir -name "*.tex" | while read -r file; do
    while read -r line; do
        for key in "${!replace_dict[@]}"; do
            line=${line//${key}/${replace_dict[$key]}}
        done
        echo $line
    done < "$file"
done

rotcx avatar Nov 18 '23 09:11 rotcx

Thank you for reporting issues to us. Since we are a general translation tool instead of a tool only working for CS or DL, we think it might be better to leave it as what it is temporarily. We could consider a functionality as a "user dictionary", by asking the users to manually define the "popular vocabulary". The only thing user need is to load a list of vocabulary. Similar to your solution here but more systematic and friendly to users. @SUSYUSTC

sherrylixuecheng avatar Dec 15 '23 16:12 sherrylixuecheng