lidbox
lidbox copied to clipboard
End-to-end spoken language identification out of the box.
[filter](https://github.com/py-lidbox/lidbox/blob/b13aca44c7bbb66d7a968fa040bdf8f1d9b3d553/lidbox/features/audio.py#L64) [resample](https://github.com/py-lidbox/lidbox/blob/b13aca44c7bbb66d7a968fa040bdf8f1d9b3d553/lidbox/features/audio.py#L37)
https://github.com/py-lidbox/lidbox/tree/master/lidbox/features Especially the correctness of DSP-related functions.
Are there any plans to train more languages, e.g. adding this dataset: https://www.50languages.com/ If it helps I can provide MP3-files as ZIPs.
utt2path, utt2label etc -> pandas.DataFrame
Defining an end-to-end pipeline in yaml adds an unnecessary layer of complexity. Perhaps a single example pipeline could be supported but any customization is easier to do with a custom...
https://github.com/py-lidbox/lidbox/blob/49c27a00eb4f02d085a4e470e08a00c427a042e0/lidbox/util.py#L101-L106
For example, the x-vector architecture should be trained on arbitrary length input. Without ragged batches, this limits the batch size to 1. By supporting ragged batches, we could train with...
https://github.com/py-lidbox/lidbox/blob/abc2a433e2ddbcbfeda65f42b80fcc873dee2e71/lidbox/util.py#L81 E.g. separate class-metrics from summary metrics.