sg-wbi

Results 15 comments of sg-wbi

I have a copy of the dataset. I'll make it local for now since no license information is provided

https://github.com/bigscience-workshop/biomedical/tree/master/bigbio/biodatasets/cas

The idea moving forward would be to attach specific tags to specific taks, this way we can have a test for this information, e.g. the "MULTIPLE_CHOICE" tag should be available...

One more thing: during the process I was tempted to create a `SOCIAL_MEDIA` and `CLINICAL` tag, but I think we should have yet another metadata attribute specific only for "domain"/"source".

> the text classification tasks has MESH codes but the NER task does not. This is because the MeSH codes are assigned as "document" (global) tags and are used for...

Thank you for checking out my comments @giyaseddin ! I am trying to inspect the dataset with: ```python [ins] In [7]: from datasets import load_dataset ds = load_dataset("biodatasets/medquad/medquad.py", "medquad_source") ```...

Provided that I may not understand all the implications of this, but to my eyes the easiest solution would be to make `name` required. I actually talked with @hakunanatasha about...

> but my preference is to allow users the most quick access to information, and let them explore more details if they need it. Ok this is way required `name`...

Hey @mcullan thank you very much for your contribution. First of all, as you found out we are dealing with a very nasty dataset here, which will take some effort...

Please do not forget to remove from the PR the `requirements.txt` file. #213 : for reference