bilingual_book_maker icon indicating copy to clipboard operation
bilingual_book_maker copied to clipboard

可以支持上下文

Open yihong0618 opened this issue 1 year ago • 7 comments

比如人名专有名词都用一致的翻译,但消耗的 token 是指数级的。

Sent from PPHub

yihong0618 avatar Mar 04 '23 06:03 yihong0618

你说的应该是词汇表(Glossary),不过chatgpt接口应该没有提供类似的参数

lindeer avatar Mar 04 '23 13:03 lindeer

Glossary 的部分,就必須在每一次 Call API 要傳送一遍,倒是不會有指數級,而是一個固定量。

DennySORA avatar Mar 04 '23 14:03 DennySORA

不知到是否支持名称暂不翻译,仅返回名称字符记录在案。这样后期可以做全局替换。

yiran1211 avatar Mar 07 '23 02:03 yiran1211

理论可行。

yihong0618 avatar Mar 07 '23 02:03 yihong0618

GPT’s solution: One approach to address this issue is to use named entity recognition (NER) to identify the names of people, places, and other entities in the text, and then map those entities to their corresponding translated names.

You can use a pre-trained NER model such as spaCy or Stanford CoreNLP to identify the entities in each paragraph. Once you have identified the entities, you can use a mapping table or dictionary to store the translated names of each entity. When translating a paragraph, you can then use the mapping table to ensure that the same entity is consistently translated with the same name throughout the novel.

Here's an example workflow you can use:

Split the novel into paragraphs. Use a pre-trained NER model such as spaCy or Stanford CoreNLP to identify the named entities in each paragraph. For each named entity, check if it exists in your mapping table or dictionary. If it does, use the corresponding translated name. If it doesn't, generate a new translated name and add it to the mapping table. Translate the paragraph, replacing the original named entities with their translated names. Repeat the process for each paragraph in the novel. Note that this approach may require some manual effort to ensure that the mapping table is accurate and up-to-date. Additionally, some named entities may have multiple possible translations depending on the context, so you may need to use additional heuristics to disambiguate them.

yiran1211 avatar Mar 17 '23 17:03 yiran1211

@yiran1211 分割成段落还是会造成冗余。不过按你的思路,书籍翻译需要分成三个阶段:1.先把整本书用NER工具扫一遍,需要识别人名地名专有名称,2.翻译好这些词汇做成词汇表,也就是你说的mapping tables。3.再用gpt进行翻译,充分利用gpt理解上下文的能力,提前设置好prompts,让gpt记住词汇表,用词汇表中的内容翻译匹配的名称。

lindeer avatar Mar 19 '23 02:03 lindeer

想了想,NER工具也不是很靠谱,有一些需要特定翻译的语词这些工具应该扫不出来,比如古代一些官职名称

lindeer avatar Mar 19 '23 09:03 lindeer