`download_lexicon` produces float seconds and empty priority; `CSVPoseLookup` expects integer ms and int priority (ValueError)
Summary
While following the installation and usage instructions to generate a SignSuisse lexicon and run text_to_gloss_to_pose, I encountered two crashes originating from CSVPoseLookup / PoseLookup.make_dictionary_index:
-
ValueError: invalid literal for int() with base 10: '2.16'— becauseindex.csvstoresstart/endas seconds (floats) (e.g.,2.16) but the parser doesint(d['end']). I would say it is expecting integers (milliseconds). -
ValueError: invalid literal for int() with base 10: ''— becausepriorityis an empty string inindex.csv, yet the parser doesint(d['priority']).
These issues are reproducible with the SignSuisse lexicon and the public instructions.
Environment / Setup (clean, minimal and non-duplicated)
# (Optional) create dedicated environment
# conda create --prefix /home/gsantm/data/conda/envs/rule_based python=3.10 -y
# Load conda
source /home/gsantm/data/environments/miniconda/etc/profile.d/conda.sh
# Activate environment
conda activate /home/gsantm/data/conda/envs/rule_based
# Install the package (choose ONE of the two lines; I used the Git URL)
pip install git+https://github.com/ZurichNLP/spoken-to-signed-translation.git
# OR: git clone https://github.com/ZurichNLP/spoken-to-signed-translation && cd spoken-to-signed-translation && pip install .
# Extra deps required by the pipeline (needed at runtime on my side)
pip install sign-language-datasets opencv-python spacy
Generate SignSuisse lexicon:
export LEXICON_NAME="signsuisse"
export LEXICON_SAVE_DIR="/home/gsantm/store/rule_based_lexicon"
export TFDS_DATA_DIR="/home/gsantm/store/rule_based_lexicon/tensorflow_datasets"
download_lexicon --name "${LEXICON_NAME}" --directory "${LEXICON_SAVE_DIR}"
Run the pipeline (example):
export SPOKEN_LANGUAGE="de"
export SIGNED_LANGUAGE="sgg"
export GLOSSER="rules"
export INPUT_TEXT="Es hat fast eine Featherie wie eine Haltung zu ihm."
export OUTPUT_POSE="/home/gsantm/scripts/back_translation/test.pose"
text_to_gloss_to_pose --text "${INPUT_TEXT}" --glosser "${GLOSSER}" --lexicon "${LEXICON_SAVE_DIR}" --spoken-language "${SPOKEN_LANGUAGE}" --signed-language "${SIGNED_LANGUAGE}" --pose "${OUTPUT_POSE}"
Error 1 (float seconds in end)
Traceback (most recent call last):
File ".../bin/text_to_gloss_to_pose", line 7, in <module>
sys.exit(text_to_gloss_to_pose())
File ".../spoken_to_signed/bin.py", line 119, in text_to_gloss_to_pose
_text_input_arguments(args_parser)
File ".../spoken_to_signed/bin.py", line 79, in _text_input_arguments
lookup = CSVPoseLookup(pre_args.lexicon)
File ".../lookup/csv_lookup.py", line 15, in __init__
super().__init__(rows=rows, directory=directory, backup=backup)
File ".../lookup/lookup.py", line 21, in __init__
self.words_index = self.make_dictionary_index(rows, based_on="words")
File ".../lookup/lookup.py", line 39, in make_dictionary_index
"end": int(d['end']),
ValueError: invalid literal for int() with base 10: '2.16'
A typical index.csv entry produced by download_lexicon looked like:
path,spoken_language,signed_language,start,end,words,glosses,priority
sgg/126464.pose,de,sgg,0,2.16,WETTEN,wetten,
Note: end is a float in seconds, not an int.
Root cause
PoseLookup.make_dictionary_index currently performs int(d['start']) / int(d['end']), implying the code expects integers (and per get_pose() logic, these are interpreted as milliseconds, not seconds-float). Thus, values like "2.16" fail to parse and, even if parsed, would be misinterpreted (as 2 ms).
Error 2 (empty priority)
After addressing time units, I hit:
ValueError: invalid literal for int() with base 10: ''
because priority is empty in the CSV, but parsed using int(d['priority']).
Workarounds I used
A) Permanent fix in the index generator (preferred)
I modified the lexicon generator so that it writes milliseconds and a default priority:
duration_ms = int(round(len(pose_body.data) * 1000 / fps)) # milliseconds (int)
yield {
"path": pose_relative_path,
"spoken_language": spoken_language,
"signed_language": signed_language,
"words": words,
"start": "0", # ms as string
"end": str(duration_ms), # ms as string
"glosses": "",
"priority": "0", # default priority
}
Then I rebuilt the lexicon with:
export LEXICON_NAME="signsuisse"
export LEXICON_SAVE_DIR="/home/gsantm/store/rule_based_lexicon"
export TFDS_DATA_DIR="/home/gsantm/store/rule_based_lexicon/tensorflow_datasets"
download_lexicon --name "${LEXICON_NAME}" --directory "${LEXICON_SAVE_DIR}"
B) Quick post-hoc CSV patch
Before applying (A), I also validated a quicker path by patching index.csv in place:
import csv, os
path = "/home/gsantm/store/rule_based_lexicon/index.csv"
tmp = path + ".fixed"
with open(path, newline='', encoding='utf-8') as f, open(tmp, 'w', newline='', encoding='utf-8') as g:
r = csv.DictReader(f)
w = csv.DictWriter(g, fieldnames=r.fieldnames); w.writeheader()
for row in r:
# seconds -> milliseconds
row['start'] = str(int(round(float(row['start'])*1000)))
row['end'] = str(int(round(float(row['end'])*1000)))
# default priority
row['priority'] = (row['priority'] or "").strip() or "0"
w.writerow(row)
os.replace(tmp, path)
Suggested fixes upstream
There are two complementary places to make this robust:
-
Generator (
download_lexiconflow): writestart/endin milliseconds (ints) and setpriorityto a sensible default (e.g.,"0"). This aligns with howPoseLookup.get_pose()computes frame slicing from milliseconds. -
Parser (
PoseLookup.make_dictionary_index):- Parse tolerant of floats and seconds by doing
int(float(d['start']))/int(float(d['end'])), or by clearly documenting the required unit and type. - Handle missing priority gracefully:
priority = int(d.get('priority') or 0)
- Parse tolerant of floats and seconds by doing
Either or both will prevent the errors above and improve out-of-the-box experience.
Result
After applying fix (A) and setting priority to "0", the lookup loads and text_to_gloss_to_pose runs successfully.
So what you are saying is that - https://github.com/sign-language-processing/spoken-to-signed-translation/blob/21a34fbb7ae6439eb8ed54b0c4a2a5c4538a7977/spoken_to_signed/gloss_to_pose/lookup/lookup.py#L38-L39 should be a float?
Fine with me. Please make a PR
If the rows in index.csv always had float values for start and end, why did int(d['end']) not fail before? I don't understand that yet. And also, what does this have to do with multithreading?
not sure about the int/float issue -
but the other issue sant came to me with is basically https://github.com/sign-language-processing/pose/issues/177 - that reading multiple poses at once gives him an error