ai-deadlines
ai-deadlines copied to clipboard
Add scraping of deadlines and conference rankings
I added automated scraping of
Currently the output are two files (actually yaml, but Github wants txt for upload):
- conference_new_candidates.txt: contains new deadlines
- conference_update_candidates.txt: contains updates to existing deadlines (updated information is currently marked with
(NEW))
Some open points:
- [ ] WikiCFP has a CC BY-SA 3.0 license, so all the deadline information would need to follow this license if we want to use it
- [ ] I did not find the license for the data scraped from Core
- [ ] The scraping can be done within the Docker container, which means that we could use Github Actions to automatically commit suggestions, i.e. to automatically commit the above mentioned files. This way, people from the community could easily check this data as well? What do you think?
- [ ] What is the "best" output format? I thought having extra
yamlfiles might be a good start, but maybe there are other ideas? - [ ] Any ideas on how to best match the conference master data information with the results from scraping?
- [ ] Recent changes have been overwritten, need to check
conferences.ymlbefore merge
Cheers Alex
@a-nau thanks for this PR. I will be reviewing this by the end of week.