pldb icon indicating copy to clipboard operation
pldb copied to clipboard

Number of papers referencing Julia is way too low.

Open oscardssmith opened this issue 2 years ago • 11 comments

Google scholar lists 3750 articles citing the main Julia paper (https://scholar.google.com/scholar?cites=12373977815425691465&as_sdt=40000005&sciodt=0,22&hl=en) and semantic scholar shows 38000 papers with Julia as a keyword since 2012, and of the first 10 pages, all appear to be Julia papers.

Also, github shows 14000 repositories with julia code https://github.com/search?q=language%3AJulia&type=Repositories&ref=advsearch&l=Julia&l=.

I'm also pretty sure the number of downloads is wrong given that https://www.hpcwire.com/2021/01/13/julia-update-adoption-keeps-climbing-is-it-a-python-challenger/ lists 9 million downloads in 2020.

oscardssmith avatar Aug 27 '22 20:08 oscardssmith

Yes, I apologize I just started adding those.

One of my top priorities this week is to improve the papers, books, repo, jobs and file importers.

I think these will be really helpful (when they are actually accurate) :)

breck7 avatar Aug 27 '22 21:08 breck7

no problem! This looks like a really good resource, I just wanted to bring it to your attention because I looked at it, and was pretty sure something was up. The 85 jobs also seems really low to me, but I don't actually have any data there.

oscardssmith avatar Aug 27 '22 21:08 oscardssmith

(ah yeah, jobs too) 👍

breck7 avatar Aug 27 '22 21:08 breck7

Okay a fix for repo counts is live: https://pldb.com/languages/julia.html

(still working on the other issues)

breck7 avatar Aug 29 '22 16:08 breck7

Thanks for fixing these so quickly!

oscardssmith avatar Aug 29 '22 16:08 oscardssmith

Although the number seems to be wrong in the other direction now. The page says 54k, but github says 14k (although it also says the search timed out so I'm not sure what the right number is).

oscardssmith avatar Aug 29 '22 16:08 oscardssmith

I saw that discrepancy as well. It shows the number coming from the raw API. I think they provide some filtering on the raw search results (or they just do a partial search). They seem to have that defined as "available repository results":

Screen Shot 2022-08-29 at 8 03 33 AM

I think I'm showing the correct number, but maybe the link text could be better so that's not confusing when people follow that link.

breck7 avatar Aug 29 '22 18:08 breck7

Also for job numbers, indeed is showing roughly 600 (although I don't know how accurate a count that is). https://www.indeed.com/jobs?q=julia+programming&redirected=1&vjk=5cafb23b1e86ae0c

oscardssmith avatar Sep 03 '22 04:09 oscardssmith

Thanks @oscardssmith ! Okay I am going to clean up the importer code now. I've got someone trying to write a model to better detect false positives (and false negatives) for things like book titles and paper titles. Hopefully that will get up to speed this week and perform well.

breck7 avatar Sep 03 '22 14:09 breck7

great! Thanks for all the work on this. It's a really cool resource.

oscardssmith avatar Sep 03 '22 16:09 oscardssmith

Hello can I contribute to this issue

Mamolets avatar Feb 14 '24 17:02 Mamolets

If anyone wants to work on proper importing of Academic Papers for languages, I'll leave this issue open.

breck7 avatar May 19 '24 18:05 breck7

Actually I'm going to close this (in keeping with the new convention of closing all issues that are of type "we need to add more data").

breck7 avatar May 20 '24 14:05 breck7