bibsearch
bibsearch copied to clipboard
Adding a crawl command
Adding a command to crawl bib entries from a URL (links to pdf files and bib files).
I couldn't check the preference for the ACL anthology (or any other downloaded sources) for papers taken from arXiv and Semantic Scholar, because I couldn't make fts5 work in my environment (so searching by title is disabled).
Thanks for this, will review soon!
Hi @vered1986. Sorry with the delay, we have been busy with "real work" in the last days :-D We have looked into the PR. If we understood correctly, the crawl command performs these operations:
- Craws the web page given as argument, looking for links to bib and pdf files.
- It adds the bib files into the DB.
- For pdf files it looks in different sources (acl anthology, arxiv, etc.) for obtaining the corresponding bib entries.
Please confirm that this is in fact what is happening and we are not completely off-track. We like the approach, it can be pretty useful e.g. for crawling a personal or group's publication list.
Wrt. fts5, now bibsearch should work much better if you don't have support it (although installing fts is recommended). What system do you use?
Thanks @davvil! No problem, and yes, that's exactly what it does :)
Thanks, I'll try it again without fts5. I'm running it on Ubuntu 16.04. I tried it both on my conda (couldn't figure out how to install a custom pysqlite with conda) and on a new python environment following the instructions here. I managed to install sqlite with fts5 support, but not to link my python environment to that custom sqlite. I guess there is a way to do it but in any case it's suboptimal to have to install a new python environment to get this to work. Also tried Linuxbrew which didn't work smoothly.