autosub
autosub copied to clipboard
pip: multiple versions
if install via pip,
there is also an autosub1
but autosub
is out of date, last updated 2017-05may
also both can't be installed [together] (incompatible),
requires separate virtualenv
or venv
autosub1 0.4.7
autosub1 broken link
where is information regarding this, can't find - is this undocumented?
- whether this is intentional, or if it can be mitigated/addressed
I glance at the codes of autosub1 installed from the pip. They are almost the same as the original autosub. Currently I don't know where it has been modified. Maintainers are different, so it is perhaps unrelated to the autosub main repo. If you want to install the newest windows-available version autosub, you can install it from my repo.
choco install git -y
pip install git+https://github.com/BingLingGroup/autosub.git@alpha
Changes are listed on the CHANGELOG.md. Of course, for the version 0.4.0-alpha. That is the original 0.4.0 version autosub with slightly little changes to make it running smoothly on windows.
PS: Recently I will release a new version of autosub based on my repo's dev branch. It will have a significant improvement on almost every parts.
notice that autosub1
seems to act differently
- it uses longer audio segments thus
- has more surrounding word context for more accurate transcription
- it is too? long for typical audio (vs preference)
shorter segments result in better (finer) timing resolution
also there is a problem with how that google has seemed to change its text-to-speech algorithm in that it is [now] much [more] sensitive to context of the surrounding words - it seems to incorrectly hallucinate [words] that are similar in topic to adjacent words even when incorrect - unlike during the past when it used a more plain-pure-basic/vanilla/naive approach
notice that
autosub1
seems to act differently
it uses longer audio segments thus
- has more surrounding word context for more accurate transcription
- it is too? long for typical audio (vs preference)
also there is a problem with how that google has seemed to change its text-to-speech algorithm in that it is [now] much [more] sensitive to context of the surrounding words - it seems to incorrectly hallucinate [words] that are similar in topic to adjacent words even when incorrect - unlike during the past when it used a more plain-pure-basic/vanilla/naive approach
@daT4v1s
No codes no clues.
Left one is autosub1. Right one is autosub-0.4.0-alpha.
It seems autosub1 just reduce the threshold. No more magical things. Anyway the percentile square root method is not that accurate. I recommend using Auditok instead. It has more arguments to control the speech regions detection. Remember that this google speech api only accept 10-15 seconds long audio clip. So using long-term audio to improve context detection results is impossible unless supporting cloud speech api in the future.
I'm still working on my release for autosub. The new Auditok features will be added in the future release in my repo. Stay tuned.
what about
- a hybrid method using both
- short segment[s] upload (better time resolution)
- and long-overlapping segments for accuracy
and searching for obvious errors/differences and choosing the most likely option
- prefer/try as automatic as possible
- but simple yes/no prompts might help as well
what about
a hybrid method using both
- short segment[s] upload (better time resolution)
- and long-overlapping segments for accuracy
and searching for obvious errors/differences and choosing the most likely option
- prefer/try as automatic as possible
- but simple yes/no prompts might help as well
emmm... Now the program has the retry mechanism during the speech-to-text process. Retry count is 3. But honestly most of retries happen when your network isn't stable enough to receive the full data from the server rather than the server didn't give you the full result.
And if you want longer segments for accurate detection but still remain in the acceptable short-term range, you can reduce the detection threshold or increase the max region size to 10(max is 10, default is 6) or do other specific tweaking. You can go to the Auditok docs website to know how to use these args to control the regions detection.
Most of the speech sentences are below 10 seconds, at least for the clauses. I think it's enough for speech detection. If you want longer detection, just use the chargeable Cloud Speech API. Or if you want more accurate performance on translation result, you can adjust the times and merge them into one long sentence since the translation API doesn't refuse any long-term segments. Sure I will think about using some method automatically merge the sentences and give them to translation API and then split them based on the regions.