lingtrain-aligner
lingtrain-aligner copied to clipboard
Add text splitting into small parts
The current version ignores the H1-H5 headers that were added by user. But when book was translate text from chapter 1 will be translate as a chapter 1 text into another language. You can use this fact and split a big text to small parts.
Next idea - try split a big text to small blocks automatically: Select a few sentences from original text(for example 10 sentences) and using loop try to find translate block in the thanslated text.
You can use the next psedocode:
left_array = original_sentences[100:110]
sum=[]
for i=50;i<150 do:
right_array_candidate=translated_sentences[i:i+10]
sum[i]=sum(cosunuse_distance(left_array,right_array_candidate))
rigth_array=get_index_with_max_value(sum)
left_text_split_index=left_array[0]
rigth_text_split_index=rigth_array[0]