community
community copied to clipboard
Issue with tutorials/speech2srt/index.md
Hi,
I had to do the following to make the tutorial at https://cloud.google.com/community/tutorials/speech2srt work. Obviously, this patch is not great but I needed to get the job done.
I know that the solution is described here https://googleapis.dev/python/translation/latest/UPGRADING.html and that you are working on snippets here https://github.com/googleapis/python-translate/tree/master/samples/snippets
But people who follow the official tutorial expect things to work so I thought I should share this just in case another person tries to run your outdated tutorial.
It would be nice to warn the user about the problem in the tutorial.
diff --git a/tutorials/speech2srt/example.wav b/tutorials/speech2srt/example.wav
index 376b5c1..4d6191b 100644
Binary files a/tutorials/speech2srt/example.wav and b/tutorials/speech2srt/example.wav differ
diff --git a/tutorials/speech2srt/translate_txt.py b/tutorials/speech2srt/translate_txt.py
index b9b0bfb..42112c1 100644
--- a/tutorials/speech2srt/translate_txt.py
+++ b/tutorials/speech2srt/translate_txt.py
@@ -15,12 +15,15 @@
# limitations under the License.
from google.cloud import translate
+def location_path(project_id, location):
+ return f"projects/{project_id}/locations/{location}"
def get_supported_languages(project_id):
"""Getting a list of supported language codes"""
client = translate.TranslationServiceClient()
- parent = client.location_path(project_id, "global")
+ parent = location_path(project_id, "global")
response = client.get_supported_languages(parent=parent)
# List language codes of supported languages
@@ -44,14 +47,17 @@ def batch_translate_text(
input_configs = [input_configs_element]
gcs_destination = {"output_uri_prefix": output_uri}
output_config = {"gcs_destination": gcs_destination}
- parent = client.location_path(project_id, location)
+ parent = location_path(project_id, location)
operation = client.batch_translate_text(
- parent=parent,
- source_language_code=source_lang,
- target_language_codes=target_language_codes,
- input_configs=input_configs,
- output_config=output_config)
+ request={
+ "parent": parent,
+ "source_language_code": "en",
+ "target_language_codes": ["ja"], # Up to 10 language codes here.
+ "input_configs": [input_configs_element],
+ "output_config": output_config,
+ }
+ )
print(u"Waiting for operation to complete...")
response = operation.result(90)
diff --git a/tutorials/speech2srt/txt2srt.py b/tutorials/speech2srt/txt2srt.py
index 2328723..fff77a3 100644
--- a/tutorials/speech2srt/txt2srt.py
+++ b/tutorials/speech2srt/txt2srt.py
@@ -48,8 +48,12 @@ def update_srt(lang, langfile, subs):
lines = f.readlines()
i = 0
for line in lines:
- subs[i].content = line
- i += 1
+ try:
+ print("i:%s line:%s" % (i,line))
+ subs[i].content = line
+ i += 1
+ except:
+ pass
return subs
Thanks, Daniel
By the way, installing the python requirements for this tutorial also failed (Ubuntu 18) and I had to do this:
venv$ pip3 install --upgrade pip
venv$ pip3 install --no-cache-dir wheel
venv$ pip3 install --no-cache-dir -r requirements.txt
Another problem with the tutorial is that the translate API is translating each subtitle entry individually. For that reason, if your sentence is split into multiple lines, each line will be translated without the whole sentence's context which results in poor translation results.
@lepistom , could you take a look at this report about a document that you contributed?