spoken-to-signed-translation icon indicating copy to clipboard operation
spoken-to-signed-translation copied to clipboard

`download_lexicon` produces float seconds and empty priority; `CSVPoseLookup` expects integer ms and int priority (ValueError)

Open GerrySant opened this issue 7 months ago • 3 comments

Summary

While following the installation and usage instructions to generate a SignSuisse lexicon and run text_to_gloss_to_pose, I encountered two crashes originating from CSVPoseLookup / PoseLookup.make_dictionary_index:

  1. ValueError: invalid literal for int() with base 10: '2.16' — because index.csv stores start/end as seconds (floats) (e.g., 2.16) but the parser does int(d['end']). I would say it is expecting integers (milliseconds).
  2. ValueError: invalid literal for int() with base 10: '' — because priority is an empty string in index.csv, yet the parser does int(d['priority']).

These issues are reproducible with the SignSuisse lexicon and the public instructions.


Environment / Setup (clean, minimal and non-duplicated)

# (Optional) create dedicated environment
# conda create --prefix /home/gsantm/data/conda/envs/rule_based python=3.10 -y

# Load conda
source /home/gsantm/data/environments/miniconda/etc/profile.d/conda.sh

# Activate environment
conda activate /home/gsantm/data/conda/envs/rule_based

# Install the package (choose ONE of the two lines; I used the Git URL)
pip install git+https://github.com/ZurichNLP/spoken-to-signed-translation.git
# OR: git clone https://github.com/ZurichNLP/spoken-to-signed-translation && cd spoken-to-signed-translation && pip install .

# Extra deps required by the pipeline (needed at runtime on my side)
pip install sign-language-datasets opencv-python spacy

Generate SignSuisse lexicon:

export LEXICON_NAME="signsuisse"
export LEXICON_SAVE_DIR="/home/gsantm/store/rule_based_lexicon"
export TFDS_DATA_DIR="/home/gsantm/store/rule_based_lexicon/tensorflow_datasets"

download_lexicon   --name "${LEXICON_NAME}"   --directory "${LEXICON_SAVE_DIR}"

Run the pipeline (example):

export SPOKEN_LANGUAGE="de"
export SIGNED_LANGUAGE="sgg"
export GLOSSER="rules"
export INPUT_TEXT="Es hat fast eine Featherie wie eine Haltung zu ihm."
export OUTPUT_POSE="/home/gsantm/scripts/back_translation/test.pose"

text_to_gloss_to_pose   --text "${INPUT_TEXT}"   --glosser "${GLOSSER}"   --lexicon "${LEXICON_SAVE_DIR}"   --spoken-language "${SPOKEN_LANGUAGE}"   --signed-language "${SIGNED_LANGUAGE}"   --pose "${OUTPUT_POSE}"

Error 1 (float seconds in end)

Traceback (most recent call last):
  File ".../bin/text_to_gloss_to_pose", line 7, in <module>
    sys.exit(text_to_gloss_to_pose())
  File ".../spoken_to_signed/bin.py", line 119, in text_to_gloss_to_pose
    _text_input_arguments(args_parser)
  File ".../spoken_to_signed/bin.py", line 79, in _text_input_arguments
    lookup = CSVPoseLookup(pre_args.lexicon)
  File ".../lookup/csv_lookup.py", line 15, in __init__
    super().__init__(rows=rows, directory=directory, backup=backup)
  File ".../lookup/lookup.py", line 21, in __init__
    self.words_index = self.make_dictionary_index(rows, based_on="words")
  File ".../lookup/lookup.py", line 39, in make_dictionary_index
    "end": int(d['end']),
ValueError: invalid literal for int() with base 10: '2.16'

A typical index.csv entry produced by download_lexicon looked like:

path,spoken_language,signed_language,start,end,words,glosses,priority
sgg/126464.pose,de,sgg,0,2.16,WETTEN,wetten,

Note: end is a float in seconds, not an int.

Root cause

PoseLookup.make_dictionary_index currently performs int(d['start']) / int(d['end']), implying the code expects integers (and per get_pose() logic, these are interpreted as milliseconds, not seconds-float). Thus, values like "2.16" fail to parse and, even if parsed, would be misinterpreted (as 2 ms).


Error 2 (empty priority)

After addressing time units, I hit:

ValueError: invalid literal for int() with base 10: ''

because priority is empty in the CSV, but parsed using int(d['priority']).


Workarounds I used

A) Permanent fix in the index generator (preferred)

I modified the lexicon generator so that it writes milliseconds and a default priority:

duration_ms = int(round(len(pose_body.data) * 1000 / fps))  # milliseconds (int)

yield {
    "path": pose_relative_path,
    "spoken_language": spoken_language,
    "signed_language": signed_language,
    "words": words,
    "start": "0",                 # ms as string
    "end": str(duration_ms),      # ms as string
    "glosses": "",
    "priority": "0",              # default priority
}

Then I rebuilt the lexicon with:

export LEXICON_NAME="signsuisse"
export LEXICON_SAVE_DIR="/home/gsantm/store/rule_based_lexicon"
export TFDS_DATA_DIR="/home/gsantm/store/rule_based_lexicon/tensorflow_datasets"

download_lexicon   --name "${LEXICON_NAME}"   --directory "${LEXICON_SAVE_DIR}"

B) Quick post-hoc CSV patch

Before applying (A), I also validated a quicker path by patching index.csv in place:

import csv, os
path = "/home/gsantm/store/rule_based_lexicon/index.csv"
tmp  = path + ".fixed"
with open(path, newline='', encoding='utf-8') as f, open(tmp, 'w', newline='', encoding='utf-8') as g:
    r = csv.DictReader(f)
    w = csv.DictWriter(g, fieldnames=r.fieldnames); w.writeheader()
    for row in r:
        # seconds -> milliseconds
        row['start'] = str(int(round(float(row['start'])*1000)))
        row['end']   = str(int(round(float(row['end'])*1000)))
        # default priority
        row['priority'] = (row['priority'] or "").strip() or "0"
        w.writerow(row)
os.replace(tmp, path)

Suggested fixes upstream

There are two complementary places to make this robust:

  1. Generator (download_lexicon flow): write start/end in milliseconds (ints) and set priority to a sensible default (e.g., "0"). This aligns with how PoseLookup.get_pose() computes frame slicing from milliseconds.

  2. Parser (PoseLookup.make_dictionary_index):

    • Parse tolerant of floats and seconds by doing int(float(d['start'])) / int(float(d['end'])), or by clearly documenting the required unit and type.
    • Handle missing priority gracefully:
      priority = int(d.get('priority') or 0)
      

Either or both will prevent the errors above and improve out-of-the-box experience.


Result

After applying fix (A) and setting priority to "0", the lookup loads and text_to_gloss_to_pose runs successfully.

GerrySant avatar Sep 01 '25 13:09 GerrySant

So what you are saying is that - https://github.com/sign-language-processing/spoken-to-signed-translation/blob/21a34fbb7ae6439eb8ed54b0c4a2a5c4538a7977/spoken_to_signed/gloss_to_pose/lookup/lookup.py#L38-L39 should be a float?

Fine with me. Please make a PR

AmitMY avatar Sep 02 '25 06:09 AmitMY

If the rows in index.csv always had float values for start and end, why did int(d['end']) not fail before? I don't understand that yet. And also, what does this have to do with multithreading?

bricksdont avatar Sep 03 '25 11:09 bricksdont

not sure about the int/float issue - but the other issue sant came to me with is basically https://github.com/sign-language-processing/pose/issues/177 - that reading multiple poses at once gives him an error

AmitMY avatar Sep 03 '25 14:09 AmitMY