elpis
elpis copied to clipboard
Support non-identical file names between .wav and .eaf, and recognise media offsets
Resolves #191, #193.
This implementation doesn't give the user any choice as to whether to match the file name of the corresponding .eaf file or to just get it from RELATIVE_MEDIA_URL. It defaults to the former behaviour and falls back to the later.
This implementation also ignores MEDIA_URL as it is difficult to wrestle it (e.g. "file:///Users/bbb/Desktop/abui/abui-audio-1.wav") into a format that the rest of the application will be able to handle easily. In other words, it assumes that RELATIVE_MEDIA_URL is well formed.
This also fixes any line = wer_lines[0] IndexError: list index out of range errors that may have been happening before, although please double check they are actually fixed.
Offsets are directly int()-ed from the .eaf file.
I think a big part of the original tickets was implementing a UI feature that would highlight (in particular) if audio or eaf were uploaded without the corresponding eaf or audio file (respectively). The easiest way I can envisage to accomplish this is aligning the audio files horizontally in the UI with their transcriptions, which would make it obvious that a pair was missing either component (you could also highlight rows with a missing file, or something to that effect). The unfortunate downside to this is that you'll have to replicate the verification on the front end (This might help: https://www.npmjs.com/package/elan-parser ).
Ah okay, this might be a bit more work to do on the uploading side of things, as it won't just be a file drop anymore. But I can look into it 👌