Gareth Davidson comments

Results 127 comments of


                                            Gareth Davidson

Write instructions and examples for unit-testing python code (backend and bot)

@jack-michaud I think this is a general problem with adding tests after writing the code. In general, tests are just other code. Re-useable code that's easy to reason about (so...

Write instructions and examples for unit-testing python code (backend and bot)

One option could be to add some unit test coverage for some easier/trivial parts, do a bit of refactoring so they're easier to test, and put a long description of...

Write instructions and examples for unit-testing python code (backend and bot)

Oh I forgot to put something in here... I got some basic unit tests merged in last weekend and plan to do some more this weekend. Nothing end-to-end or exhaustive,...

Set up a process for collecting raw datasets

As an alternative to paying for storj, S3, or letting HuggingFace have final say over what data is or isn't acceptable, archive.org are pretty reliable and supply torrents too. archive.org...

Set up an initial framework for data collection, storage, cleaning, and accessing

Yeah this looks like a good solution actually as it keeps the datasets and the code separate. Do we have an idea for an example dataset that's smallish or not...

Add Zhihu data (#1459)

I haven't run it myself yet, but the code looks very nice. Well structured, isolated and documented, very readable. Nice work 🙂👍

Data source: Wikipedia Talk Pages

Awesome, data schemas are here: https://projects.laion.ai/Open-Assistant/docs/data/schemas

Data source: Wikipedia Talk Pages

Nice work and analysis, thank you :) > * https://github.com/earwig/mwparserfromhell/ Very apt name :joy: Since there's so much of it, I guess it doesn't hurt to throw a fair portion...

How to clone my own voice locally

Dunno if it'll work but I did this: * Used Mozilla Common Voice to help build a dataset for the whole world * Downloaded my data containing a couple of...

Automatic data translator

Even if we aren't interested in auto-translating data, this approach could be pretty useful for translating the UI. Like a commit hook that automatically adds missing values to all the...