arxiv-sanity-preserver
arxiv-sanity-preserver copied to clipboard
Feature Request: Expanding to all of arXiv
How feasible would it be to expand to all categories in arXiv?
Per #33, you mention that it's important to keep communities small so that "top papers" are still relevant. Couldn't this still be maintained by having a user specify as part of their account which subcategories they work in? And then top papers for a user would do some sort of cross-category normalization to account for multiple communities of different sizes. Maybe we could also crowdsource clustering of categories into different research areas and have those preset (like it has been done for ML currently).
Would love to see this platform become widely adopted!
Moreover, any chance for real time fetching from arXiv? It seems it takes a day or two for a paper to appear on site.
Thank You.
Afair papers are mostly released in bulk by arxiv once per day, so downloading more often than that isn't really necessary.
Take this one for instance:
https://arxiv.org/abs/1701.04018 (CS.cv)
Published few days ago. Still no on arXiv Sanity - http://www.arxiv-sanity.com/1701.04018.
Anyway to make the fetching part more robust?
@Moredread @RoyiAvital this is because in current terrible state I have to manually ssh into the box that runs arxiv sanity and run an authentication script & enter password, or my credentials expire after ~3 days. And sometime I forget. I can't find a way to automate this right now, but I'm working on switching AS to longer-term solution anyway.
@karpathy , I see.
Well, you do wonders with this site, so don't see it as a complain :-).
Thank You.
@karpathy can you just do something like that?
0 * * * * . /opt/deep_arxiv/config.sh; python3 /opt/deep_arxiv/scripts/arxiv_paper_fetch.py >> /opt/deep_arxiv/crontab.log
Would people find it useful to have arxiv-sanity also keep track of older papers (from before the project was started)? The value would be that one could add such papers to their library to better tailor their recommendations.
Not sure how far back arxiv-sanity currently goes...
👋🏽 Hey friends - my friends and I built filtr.pub as a fun side project to address some of the missing gaps in Arxiv Sanity Preserver. We gather data from everything in CS and Stat (along with things like citation counts from Google Scholar, papers with code links, etc.), and also have additional functionality like search queries, custom date ranges, following custom keywords, etc. We have daily jobs that sync the latest data :)
If you're interested - please check it out! It's still really early stages but we'd love some feedback :)
@jeetmehta please stop the SPAM... thanks...
Will do - sorry! :)