Problem of alignment of English
For epub + m4b, --lang en can not work, but japanese is ok. May I know why? subplz sync -d path --lang en or subplz sync -d path --language en
For english audible book + epub books. Is the command corrent? I know that this project is designed for Japanese. But I also saw that github mentioned that --lang may work other than Japanese. I have no success though.
I'm unsure right now. You may try to use a txt file instead and see if that changes as some of the epub parsing is japanese specific if I remember right.
Thank you for quick response. I tested mutiple english audio books with --language en option, it works great! It seems --lang en is different than --language en. With --lang en, it still prompt Detected Langauge: Japanese. But With --language en, I can see it transcribe English.
Interesting. I'll try to look at why and update docs accordingly. Thanks for posting the solution. This worked for epibs still I take it?
EPUB is working great. I haven't countered sync issues after using --language en option. Although sometimes I switched to turbo model. But default tiny should be fine.
Although sometimes it crashed during the last sync step (or use watch + scanner), and closed the terminal automatically (memory is still a lot left 16 GB Memory, 4060), maybe sometimes rerun it could be fine (not very sure, but indeed it will crash)...
I see, probably memory leak. I just saw memory jump beyond 16GB, and terminal close.
There shouldn't be a memory leak. It does use that much memory for the alignment for very multi-hour audiobooks. At some point we'd like to have a more memory efficient algorithm for alignment, but the current one gives very accurate results so we've kept it and warn users when they run the program and in the readme.
UPDATE: Readme was updated, but i haven't removed the "lang" flag. We'll so i'll keep this open til that happens.
I also got the memory issue. My PC has 96GB physical memory but it still crashed. In the task manager we can see that even if the physical memory is not full, the "committed" memory is full already, meaning the program probably declared a lot of memory, and crashed before it can use them. I will provide more information on where the OOM error occured.
Here is the memory error I captured:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "E:\miniconda3\envs\SubPlz\Scripts\subplz.exe\__main__.py", line 7, in <module>
File "E:\miniconda3\envs\subplz\Lib\site-packages\subplz\__main__.py", line 5, in main
run.execute_on_inputs()
File "E:\miniconda3\envs\subplz\Lib\site-packages\subplz\run.py", line 44, in execute_on_inputs
sync(
File "E:\miniconda3\envs\subplz\Lib\site-packages\subplz\sync.py", line 146, in sync
do_batch(
File "E:\miniconda3\envs\subplz\Lib\site-packages\subplz\sync.py", line 95, in do_batch
alignment, references = align.align(
^^^^^^^^^^^^
File "E:\miniconda3\envs\subplz\Lib\site-packages\ats\align.py", line 175, in align
return inner(text), [] #references
^^^^^^^^^^^
File "E:\miniconda3\envs\subplz\Lib\site-packages\ats\align.py", line 167, in inner
alignment = aligner.align(text_joined, transcript_joined)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\miniconda3\envs\subplz\Lib\site-packages\Bio\Align\__init__.py", line 3969, in align
score, paths = super().align(sA, sB, strand)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError
Press any key to continue . . .
I tried to debug the code a little bit. I think the reason is simply because in the command
alignment = aligner.align(text_joined, transcript_joined)[0]
The lengths len(text_joined) and len(transcript_joined) are too large (e.g., 1,000,000+). Maybe in English, character-level matching is not ideal and word-level matching would make it better.