Open-Assistant
Open-Assistant copied to clipboard
WikiData Crawler Notebook
A notebook that generates Q&A pairs automatically based on the contents of the WikiData knowledge graph to accelerate prompt generation.
Added README.md with step-by-step instructions as well as the Jupyter Notebook. Does not require an API key.
@sedthh can you add a run in colab button to top of notebook like in here: https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/example/example.ipynb
Is the cache.csv csv file required to run the notebook or is it an output? If required as input (and if not too large) can you make it available via a url so someone can just run whole notebook in colab if they wanted?
@andrewm4894 thanks for the heads up, I will follow the example.ipynb and update the notebook.
I can see how the addition of the cache.csv is confusing, it's optional and is only for caching the previously downloaded graph nodes (many of the nodes such as information on different measures of units are shared across mutliple topics).
I am currently working on another crawler, but afterwards I will update this with the option to generate Q&A pairs in Hungarian language too, and continue with the PR if that's ok.
@andrewm4894 thanks for the heads up, I will follow the
example.ipynband update the notebook. I can see how the addition of thecache.csvis confusing, it's optional and is only for caching the previously downloaded graph nodes (many of the nodes such as information on different measures of units are shared across mutliple topics).I am currently working on another crawler, but afterwards I will update this with the option to generate Q&A pairs in Hungarian language too, and continue with the PR if that's ok.
Hi @sedthh are you still planning on making further updates to this PR? The data looks nice, if you think this is ready as-is I'd be happy to review it as-is so we can get it merged soon
thanks @olliestanley I will finish it up in the upcoming days, wasn't sure if it's up to standard so I've switched my attention to other tasks
:x: pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
did the minor fixes for english, but I am still unhappy with how this turned out due to the limitations / ambigutiy of the knowledge graph representation
I think we can merge this. Even if there are some limitations of the data it may still be useful.