loose coupling transcription and translation steps
The steps of transcription and translation currently appear to be relatively tightly coupled. We can see that the subtitles generated by the transcription are processed in the translation step.
# file: openlrc/openlrc.py
def process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans):
...
if skip_trans:
shutil.copy(transcribed_opt_sub.filename, final_json_path)
transcribed_opt_sub.filename = final_json_path
return transcribed_opt_sub
...
And finally generated in translation worker.
def translation_worker(self, transcription_queue, target_lang, skip_trans, bilingual_sub):
...
# Handle translation
final_subtitle = process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans)
# Generate and move subtitle files
generate_subtitle_files(final_subtitle, base_name, subtitle_format)
...
This seems to violate the SRP.
At the same time, even specified skip trans=True , the translation thread will still be started. Users pay for the additional performance overhead even though they are not using it.
I wish we could decouple the two steps of transcription and translation:
- The translation step no longer processes transcribed files.
- The translation thread is no longer started when skip_trans=False is specified.
I am not familiar with nlp related knowledge. But if you agree, maybe I can try to complete this improvement.
Thank you for this detailed analysis.
I do think the structure of openlrc.py is not good - the tight coupling between transcription and translation makes the code less maintainable and less efficient. I actually started addressing some architectural issues in commit c7db967ba3d5d62c9d8e214073879fe465acd70d, but there's still room for improvement.
Feel free to open a PR with your proposed changes!