autosub icon indicating copy to clipboard operation
autosub copied to clipboard

pip: multiple versions

Open daT4v1s opened this issue 4 years ago • 5 comments

if install via pip, there is also an autosub1 but autosub is out of date, last updated 2017-05may

also both can't be installed [together] (incompatible), requires separate virtualenv or venv

autosub 0.3.12 autosub_·PyPI-_2019-07-17_02 58 02

autosub1 0.4.7 autosub1_·PyPI-_2019-07-17_02 58 29 autosub1 broken link

where is information regarding this, can't find - is this undocumented?

  • whether this is intentional, or if it can be mitigated/addressed

daT4v1s avatar Jul 17 '19 03:07 daT4v1s

I glance at the codes of autosub1 installed from the pip. They are almost the same as the original autosub. Currently I don't know where it has been modified. Maintainers are different, so it is perhaps unrelated to the autosub main repo. If you want to install the newest windows-available version autosub, you can install it from my repo.

choco install git -y
pip install git+https://github.com/BingLingGroup/autosub.git@alpha

Changes are listed on the CHANGELOG.md. Of course, for the version 0.4.0-alpha. That is the original 0.4.0 version autosub with slightly little changes to make it running smoothly on windows.

PS: Recently I will release a new version of autosub based on my repo's dev branch. It will have a significant improvement on almost every parts.

BingLingGroup avatar Jul 17 '19 14:07 BingLingGroup

notice that autosub1 seems to act differently

  • it uses longer audio segments thus
    • has more surrounding word context for more accurate transcription
    • it is too? long for typical audio (vs preference)

shorter segments result in better (finer) timing resolution

also there is a problem with how that google has seemed to change its text-to-speech algorithm in that it is [now] much [more] sensitive to context of the surrounding words - it seems to incorrectly hallucinate [words] that are similar in topic to adjacent words even when incorrect - unlike during the past when it used a more plain-pure-basic/vanilla/naive approach

daT4v1s avatar Jul 18 '19 03:07 daT4v1s

notice that autosub1 seems to act differently

  • it uses longer audio segments thus

    • has more surrounding word context for more accurate transcription
    • it is too? long for typical audio (vs preference)

also there is a problem with how that google has seemed to change its text-to-speech algorithm in that it is [now] much [more] sensitive to context of the surrounding words - it seems to incorrectly hallucinate [words] that are similar in topic to adjacent words even when incorrect - unlike during the past when it used a more plain-pure-basic/vanilla/naive approach

@daT4v1s

No codes no clues.

compare_autosub1

Left one is autosub1. Right one is autosub-0.4.0-alpha.

It seems autosub1 just reduce the threshold. No more magical things. Anyway the percentile square root method is not that accurate. I recommend using Auditok instead. It has more arguments to control the speech regions detection. Remember that this google speech api only accept 10-15 seconds long audio clip. So using long-term audio to improve context detection results is impossible unless supporting cloud speech api in the future.

I'm still working on my release for autosub. The new Auditok features will be added in the future release in my repo. Stay tuned.

BingLingGroup avatar Jul 18 '19 04:07 BingLingGroup

what about

  • a hybrid method using both
    • short segment[s] upload (better time resolution)
    • and long-overlapping segments  for accuracy

and searching for obvious errors/differences and choosing the most likely option

  • prefer/try  as automatic as possible
  • but simple yes/no prompts  might help as well

daT4v1s avatar Jul 18 '19 21:07 daT4v1s

what about

  • a hybrid method using both

    • short segment[s] upload (better time resolution)
    • and long-overlapping segments  for accuracy

and searching for obvious errors/differences and choosing the most likely option

  • prefer/try  as automatic as possible
  • but simple yes/no prompts  might help as well

emmm... Now the program has the retry mechanism during the speech-to-text process. Retry count is 3. But honestly most of retries happen when your network isn't stable enough to receive the full data from the server rather than the server didn't give you the full result.

And if you want longer segments for accurate detection but still remain in the acceptable short-term range, you can reduce the detection threshold or increase the max region size to 10(max is 10, default is 6) or do other specific tweaking. You can go to the Auditok docs website to know how to use these args to control the regions detection.

Most of the speech sentences are below 10 seconds, at least for the clauses. I think it's enough for speech detection. If you want longer detection, just use the chargeable Cloud Speech API. Or if you want more accurate performance on translation result, you can adjust the times and merge them into one long sentence since the translation API doesn't refuse any long-term segments. Sure I will think about using some method automatically merge the sentences and give them to translation API and then split them based on the regions.

BingLingGroup avatar Jul 19 '19 01:07 BingLingGroup