bibsearch icon indicating copy to clipboard operation
bibsearch copied to clipboard

Adding a crawl command

Open vered1986 opened this issue 6 years ago • 4 comments

Adding a command to crawl bib entries from a URL (links to pdf files and bib files).

I couldn't check the preference for the ACL anthology (or any other downloaded sources) for papers taken from arXiv and Semantic Scholar, because I couldn't make fts5 work in my environment (so searching by title is disabled).

vered1986 avatar Apr 17 '18 09:04 vered1986

Thanks for this, will review soon!

mjpost avatar Apr 18 '18 19:04 mjpost

Hi @vered1986. Sorry with the delay, we have been busy with "real work" in the last days :-D We have looked into the PR. If we understood correctly, the crawl command performs these operations:

  • Craws the web page given as argument, looking for links to bib and pdf files.
  • It adds the bib files into the DB.
  • For pdf files it looks in different sources (acl anthology, arxiv, etc.) for obtaining the corresponding bib entries.

Please confirm that this is in fact what is happening and we are not completely off-track. We like the approach, it can be pretty useful e.g. for crawling a personal or group's publication list.

davvil avatar Apr 24 '18 19:04 davvil

Wrt. fts5, now bibsearch should work much better if you don't have support it (although installing fts is recommended). What system do you use?

davvil avatar Apr 24 '18 19:04 davvil

Thanks @davvil! No problem, and yes, that's exactly what it does :)

Thanks, I'll try it again without fts5. I'm running it on Ubuntu 16.04. I tried it both on my conda (couldn't figure out how to install a custom pysqlite with conda) and on a new python environment following the instructions here. I managed to install sqlite with fts5 support, but not to link my python environment to that custom sqlite. I guess there is a way to do it but in any case it's suboptimal to have to install a new python environment to get this to work. Also tried Linuxbrew which didn't work smoothly.

vered1986 avatar Apr 25 '18 06:04 vered1986