joss
joss copied to clipboard
#count reviewers by lang and sectors by lang of the reviewers
If you ever got curious. JOSS reviewers data from the public list.
*** The 20 most "best known" languages...
- python ( 68.74 %)
- r ( 27.52 %)
- c++ ( 18.85 %)
- c ( 13.91 %)
- matlab ( 8.3 %)
- java ( 7.26 %)
- fortran ( 5.76 %)
- javascript ( 4.79 %)
- julia ( 4.71 %)
- bash ( 3.07 %)
- go ( 2.02 %)
- perl ( 1.65 %)
- c# ( 1.57 %)
- rust ( 1.5 %)
- php ( 1.5 %)
- ruby ( 1.27 %)
- sql ( 1.12 %)
- scala ( 0.9 %)
- haskell ( 0.82 %)
- cuda ( 0.75 %)
*** The 20 most "known" languages...
- python ( 79.43 %)
- r ( 33.88 %)
- c++ ( 31.41 %)
- c ( 27.3 %)
- matlab ( 17.88 %)
- java ( 16.45 %)
- javascript ( 12.86 %)
- fortran ( 10.62 %)
- julia ( 8.45 %)
- bash ( 6.36 %)
- perl ( 4.49 %)
- php ( 3.89 %)
- c# ( 3.66 %)
- go ( 3.14 %)
- rust ( 2.99 %)
- ruby ( 2.84 %)
- sql ( 2.24 %)
- scala ( 2.09 %)
- html ( 1.72 %)
- haskell ( 1.5 %)
*** The 4 most common sectors for the 10 most "known" languages...
python : machine learning, bioinformatics, physics, statistics,
r : bioinformatics, machine learning, statistics, genomics,
c++ : machine learning, bioinformatics, physics, statistics,
c : machine learning, bioinformatics, astrophysics, statistics,
matlab : machine learning, image processing, statistics, physics,
java : machine learning, bioinformatics, software engineering, data science,
javascript : machine learning, bioinformatics, data science, statistics,
fortran : physics, astrophysics, computational fluid dynamics, computational chemistry,
julia : machine learning, statistics, physics, data science,
bash : bioinformatics, genomics, machine learning, computational biology,
Generated with the above code (Julia)
# Source: reviewer database of JOSS at https://docs.google.com/spreadsheets/d/1PAPRJ63yq9aPC1COLjaQp8mHmEq3rZUzwUYxTulyu78/edit#gid=856801822
using OdsIO
# Loading data..
dataFile = "joss_reviewers_20200724.ods"
db = ods_read(dataFile,range=((4,2),(1340,9)))
# removing email
db = hcat(db[:,1:2],db[:,5:end])
# replacing "nothing"....
# ..with empty string in the first three columns...
for r in eachrow(db)
for cidx in 1:3
r[cidx] = isnothing(r[cidx]) ? "" : r[cidx]
end
end
# ..and with zero in the number of reviews...
for r in eachrow(db)
for cidx in 4:6
r[cidx] = isnothing(r[cidx]) ? 0 : r[cidx]
end
end
# Converting first 3 columns to string and last 4 to integers
db = convert(Array{Union{String,Int64},2},db)
# Cleaning..
for r in eachrow(db)
for cidx in 1:3
# ugly...
r[cidx] = replace(replace(replace(replace(replace(r[cidx], '/'=>','), '('=>','), ')'=> ','), '\n'=> ',') , "and"=> ',') |> strip |> lowercase
r[cidx] = replace(r[cidx],", " => ',') # to avoid empty data
r[cidx] = replace(r[cidx]," ," => ',') # to avoid empty data
r[cidx] = replace(r[cidx], r",$" => "") # remove ending comma
end
end
# Establishing vocabolaries
vocLangs = Set{String}()
vocActivities = Set{String}()
for (ridx,r) in enumerate(eachrow(db))
##if ridx > 20 break end
for cidx in 1:2
#=
debug = strip.(split(r[cidx],','))
for l in debug
if l == ""
println(l)
println(ridx)
println(cidx)
end
end
=#
if r[cidx] == "" continue end
push!(vocLangs,strip.(split(r[cidx],','))...)
end
for cidx in 3:3
if r[cidx] == "" continue end
push!(vocActivities,strip.(split(r[cidx],','))...)
end
end
vocLangs = collect(vocLangs)
vocActivities = collect(vocActivities)
langIdx = Dict{String,Int64}()
[langIdx[l] = id for (id,l) in enumerate(vocLangs)]
actIdx = Dict{String,Int64}()
[actIdx[a] = id for (id,a) in enumerate(vocActivities)]
nLangs = length(vocLangs)
nActs = length(vocActivities)
nRecords = size(db,1)
preferredLangCount = zeros(Int64,nLangs)
competentLangCount = zeros(Int64,nLangs)
actCountByLang = zeros(Int64,nLangs,nActs)
# Let's count!
for r in eachrow(db)
plangs = strip.(split(r[1],','))
olangs = strip.(split(r[2],','))
langs = union(Set(plangs),Set(olangs))
acts = strip.(split(r[3],','))
[preferredLangCount[langIdx[l]] += 1 for l in plangs if l != ""]
[competentLangCount[langIdx[l]] += 1 for l in langs if l != ""]
[actCountByLang[langIdx[l],actIdx[a]] += 1 for l in langs, a in acts if l != "" && a != ""]
end
# Let's report:
n = 20
println("*** The $n most \"best kwown\" languages...")
sortIdx = reverse(sortperm(preferredLangCount))[1:n]
[println("- $(rpad(vocLangs[i],12))\t ( $(round(100*preferredLangCount[i]/nRecords,digits=2)) %)") for i in sortIdx]
n = 20
println("*** The $n most \"known\" languages...")
sortIdx = reverse(sortperm(competentLangCount))[1:n]
[println("- $(rpad(vocLangs[i],12))\t ( $(round(100*competentLangCount[i]/nRecords,digits=2)) %)") for i in sortIdx]
n = 10
n2 = 4
println("*** The $n2 most common sectors for the $n most \"known\" languages...")
sortIdx = reverse(sortperm(competentLangCount))[1:n]
for i in sortIdx
lang = vocLangs[i]
sortIdxActs = reverse(sortperm(actCountByLang[i,:]))[1:n2]
print("$(rpad(lang,12)): \t")
[print("$(vocActivities[j]), ") for j in sortIdxActs]
print("\n")
end
✨ thanks @sylvaticus! / cc @diehlpk who has been looking at the breakdown of languages of papers we've reviewed too.
Ok, this would be interesting to add these to the paper and compare with the programming languages the repos had.