deepl-translate
deepl-translate copied to clipboard
Can we try to save \n while translating text?
For example, I want to translate
Lalalal
lalalal
Using deepl I have got -> Lalalal lalalal with cut off \n symbols and so on
Currently, this package simply passes the text as is to the deepl API. It is the deepl API itself that ignores any new line character from the input:
from deepl.api import split_into_sentences
text = """Lalalal
lalalal"""
sentences = split_into_sentences(text)
print(sentences)
Output:
['Lalalal\n\nlalalal']
from deepl.api import generate_translation_request_data
sentences = split_into_sentences(text)
data = generate_translation_request_data(
source_language="DE", target_language="EN", sentences=sentences
)
data["params"]["jobs"][0]["raw_en_sentence"]
Output:
'Lalalal\n\nlalalal'
import json
import requests
from deepl.api import headers
from deepl.settings import API_URL
response = requests.post(API_URL, data=json.dumps(data), headers=headers)
json_response = response.json()
json_response["result"]["translations"][0]["beams"][0]["postprocessed_sentence"]
Output:
'Lalalal lalalal' # no newline characters
I'm not quite sure yet how the web version of deepl handles the new lines in the text (I assume some javascript preprocessing). So I would have to do some text preprocessing before requesting the translation, which could lead to unexpected or corrupted behavior. This kind of code change should be thoroughly tested, which I don't have time to do at the moment.
If anyone can point me to a safe solution, I'd be happy to look into it.
I've seen something like a ignore_tag argument in the deepl documentation, is it a viable solution? Workaround way is just translating it line by line with for loop and use threading to speed up the process while keeping the line structure intact.