Mark Mazumder

Results 17 issues of Mark Mazumder

Is there an idiomatic way in ScalaBuff for deserializing a union-type container? This would obviate the need for protocol buffer reflection via `getAllFields` with Java-generated classes. For example, using [OneMessage...

```python uhohs = [] mswc_16khz = Path("/media/mark/hyperion/mswc/16khz_wav/en/clips") keywords = list(sorted(os.listdir(mswc_16khz))) print(len(keywords)) for keyword in tqdm.tqdm(keywords): keyword_samples = list(sorted((mswc_16khz / keyword).glob("*.wav"))) if len(keyword_samples) == 0: uhohs.append(keyword) print(len(uhohs)) >>> 24 ```

When running the intro tutorial notebook in the docker container for `tensorflow/tensorflow:latest-gpu-jupyter` the `umap` library can't be installed because `numba` only works on `numpy

Impacts certain languages more heavily than others (French, Kinyarwanda, ...) [empty_directories.txt](

in German, 'null' (zero) is being converted to `NaN` by pandas when it is the only word present in the transcript (due to single-word-target-segments data) One option is to use...


given two transcripts 1. [hello is a common greeting] and 2. [she said, “hello”], without punctuation filtering we would otherwise treat [hello] and [“hello”] as separate words