Gareth Davidson
Gareth Davidson
@jack-michaud I think this is a general problem with adding tests after writing the code. In general, tests are just other code. Re-useable code that's easy to reason about (so...
One option could be to add some unit test coverage for some easier/trivial parts, do a bit of refactoring so they're easier to test, and put a long description of...
Oh I forgot to put something in here... I got some basic unit tests merged in last weekend and plan to do some more this weekend. Nothing end-to-end or exhaustive,...
As an alternative to paying for storj, S3, or letting HuggingFace have final say over what data is or isn't acceptable, archive.org are pretty reliable and supply torrents too. archive.org...
Yeah this looks like a good solution actually as it keeps the datasets and the code separate. Do we have an idea for an example dataset that's smallish or not...
I haven't run it myself yet, but the code looks very nice. Well structured, isolated and documented, very readable. Nice work 🙂👍
Awesome, data schemas are here: https://projects.laion.ai/Open-Assistant/docs/data/schemas
Nice work and analysis, thank you :) > * https://github.com/earwig/mwparserfromhell/ Very apt name :joy: Since there's so much of it, I guess it doesn't hurt to throw a fair portion...
Dunno if it'll work but I did this: * Used Mozilla Common Voice to help build a dataset for the whole world * Downloaded my data containing a couple of...
Even if we aren't interested in auto-translating data, this approach could be pretty useful for translating the UI. Like a commit hook that automatically adds missing values to all the...