Charles Foster
Charles Foster
Surprised to see the following. On the example sentence in the README, `neuralcoref` predicts accurately: But on a slight modification, where we switch sister to brother, and swap the pronouns,...
Just a thought I've had recently: You could think of factoring the color component `c(r, d)` into at least two parts: one part, lets call it `diff(r)` for diffuse color,...
Follow up paper that pursues ideas in this vein. https://people.eecs.berkeley.edu/~pratul/nerv/
Pull request #11 added a basic CLIP objective. What remains is to implement the microbatching tricks.
After a long period of dormancy, I've spent some time figuring this component out. The notebook below implements microbatching and parallelism for the contrastive loss in a dummy CLIP setup....
May be worthwhile to play with an additional speech-domain loss in the form of DINO, which should be relatively easy to implement, with a bit of JAX code and some...
Speak of the devil! Here's a new big benchmark that does just what I'm after. https://arxiv.org/abs/2105.01051
We want to go for the largest datasets we can for this. They are listed in a Google doc. Not all of them will be downloadable via public links, so...
Some stats on Common Voice English version 6.1. 1,224,864 validated clips, of which 1,224,858 have UTF-8 captions. 596,665 unique sentences 52 Python characters on average, and 52.9 bytes on average....
@afiaka87 Good question. For now, the plan is to preprocess the data in two steps: 1. Trimmed 15 second .wav files, padded with silence if the original audio clip was...