spyql
spyql copied to clipboard
Key-value aggregations and lookups
This PR closes #59.
First, it introduces dict_agg
for key-value aggregations. Example:
$ cat codes.csv (git)-[key-value]
MCC,MCC (int),MNC,MNC (int),ISO,Country,Country Code,Network
289,649,88,2191,ge,Abkhazia,7,A-Mobile
289,649,68,1679,ge,Abkhazia,7,A-Mobile
289,649,67,1663,ge,Abkhazia,7,Aquafon
412,1042,01,31,af,Afghanistan,93,AWCC
412,1042,50,1295,af,Afghanistan,93,Etisalat
412,1042,30,783,af,Afghanistan,93,Etisalat
452,1106,04,79,vn,Vietnam,84,Viettel
452,1106,02,47,vn,Vietnam,84,VinaPhone
543,1347,299,665,wf,Wallis and Futuna,,Failed Calls
543,1347,01,31,wf,Wallis and Futuna,,Manuia
421,1057,999,2457,ye,Yemen,967,Fix Line
421,1057,04,79,ye,Yemen,967,HITS/Y Unitel
421,1057,01,31,ye,Yemen,967,Sabaphone
421,1057,03,63,ye,Yemen,967,Yemen Mob. CDMA
645,1605,01,31,zm,Zambia,260,Airtel
645,1605,299,665,zm,Zambia,260,Failed Calls
$ spyql "
SELECT dict_agg(MCC, Country) AS json
FROM csv TO json
" < codes.csv > code_country.json
$ jq . code_country.json
{
"289": "Abkhazia",
"412": "Afghanistan",
"452": "Vietnam",
"543": "Wallis and Futuna",
"421": "Yemen",
"645": "Zambia"
}
Second, it introduces the kv
function to allow loading JSON objects of key-value lookups, emulating left equi joins. Example:
$ cat towers_mini.csv
radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal
UMTS,262,2,776,165186,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
GSM,262,3,1075,5651,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
UMTS,262,2,801,165123,0,13.295516967773,52.494735717773,1000,1,1,1286864565,1286864565,0
UMTS,262,2,801,165121,0,13.301697,52.499886,1000,4,1,1286884439,1297735807,0
GSM,624,2,6,13883,0,9.704361,4.063911,1588,8,1,1352031870,1374212876,0
GSM,624,2,6,34381,0,9.709009,4.066368,2484,13,1,1352031870,1380389330,0
GSM,452,1,10068,5293,0,105.8031463623,20.959854125977,1000,2,1,1459692342,1488347104,0
GSM,645,3,130,3282,0,28.070755004883,-14.902267456055,1000,1,1,1459743588,1459743588,0
GSM,645,3,1,32221,0,28.252716,-15.439224,1000,2,1,1369353852,1369353852,0
GSM,645,3,1,33932,0,28.255325,-15.436752,1000,5,1,1369353888,1369353888,0
GSM,645,3,1,31662,0,28.258072,-15.434555,1000,5,1,1369353960,1369353960,0
$ spyql "SELECT mcc, kv('code_country.json', mcc) AS country, count_agg(1) AS n_recs FROM csv GROUP BY 1, 2 ORDER BY 1 TO pretty" < towers_mini.csv
mcc country n_recs
----- --------- --------
262 4
452 Vietnam 1
624 2
645 Zambia 4