Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

WikiData Crawler Notebook

Open sedthh opened this issue 2 years ago • 6 comments

A notebook that generates Q&A pairs automatically based on the contents of the WikiData knowledge graph to accelerate prompt generation.

Added README.md with step-by-step instructions as well as the Jupyter Notebook. Does not require an API key.

sedthh avatar Feb 02 '23 16:02 sedthh

@sedthh can you add a run in colab button to top of notebook like in here: https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/example/example.ipynb

Is the cache.csv csv file required to run the notebook or is it an output? If required as input (and if not too large) can you make it available via a url so someone can just run whole notebook in colab if they wanted?

andrewm4894 avatar Feb 03 '23 15:02 andrewm4894

@andrewm4894 thanks for the heads up, I will follow the example.ipynb and update the notebook. I can see how the addition of the cache.csv is confusing, it's optional and is only for caching the previously downloaded graph nodes (many of the nodes such as information on different measures of units are shared across mutliple topics).

I am currently working on another crawler, but afterwards I will update this with the option to generate Q&A pairs in Hungarian language too, and continue with the PR if that's ok.

sedthh avatar Feb 04 '23 18:02 sedthh

@andrewm4894 thanks for the heads up, I will follow the example.ipynb and update the notebook. I can see how the addition of the cache.csv is confusing, it's optional and is only for caching the previously downloaded graph nodes (many of the nodes such as information on different measures of units are shared across mutliple topics).

I am currently working on another crawler, but afterwards I will update this with the option to generate Q&A pairs in Hungarian language too, and continue with the PR if that's ok.

Hi @sedthh are you still planning on making further updates to this PR? The data looks nice, if you think this is ready as-is I'd be happy to review it as-is so we can get it merged soon

olliestanley avatar Feb 20 '23 21:02 olliestanley

thanks @olliestanley I will finish it up in the upcoming days, wasn't sure if it's up to standard so I've switched my attention to other tasks

sedthh avatar Feb 20 '23 21:02 sedthh

:x: pre-commit failed. Please run pre-commit run --all-files locally and commit the changes. Find more information in the repository's CONTRIBUTING.md

github-actions[bot] avatar Feb 21 '23 14:02 github-actions[bot]

did the minor fixes for english, but I am still unhappy with how this turned out due to the limitations / ambigutiy of the knowledge graph representation

sedthh avatar Feb 21 '23 15:02 sedthh

I think we can merge this. Even if there are some limitations of the data it may still be useful.

olliestanley avatar Feb 24 '23 17:02 olliestanley