spyql icon indicating copy to clipboard operation
spyql copied to clipboard

Key-value aggregations and lookups

Open dcmoura opened this issue 2 years ago • 0 comments

This PR closes #59.

First, it introduces dict_agg for key-value aggregations. Example:

$ cat codes.csv                                                  (git)-[key-value] 
MCC,MCC (int),MNC,MNC (int),ISO,Country,Country Code,Network
289,649,88,2191,ge,Abkhazia,7,A-Mobile
289,649,68,1679,ge,Abkhazia,7,A-Mobile
289,649,67,1663,ge,Abkhazia,7,Aquafon
412,1042,01,31,af,Afghanistan,93,AWCC
412,1042,50,1295,af,Afghanistan,93,Etisalat
412,1042,30,783,af,Afghanistan,93,Etisalat
452,1106,04,79,vn,Vietnam,84,Viettel
452,1106,02,47,vn,Vietnam,84,VinaPhone
543,1347,299,665,wf,Wallis and Futuna,,Failed Calls
543,1347,01,31,wf,Wallis and Futuna,,Manuia
421,1057,999,2457,ye,Yemen,967,Fix Line
421,1057,04,79,ye,Yemen,967,HITS/Y Unitel
421,1057,01,31,ye,Yemen,967,Sabaphone
421,1057,03,63,ye,Yemen,967,Yemen Mob. CDMA
645,1605,01,31,zm,Zambia,260,Airtel
645,1605,299,665,zm,Zambia,260,Failed Calls

$ spyql "
    SELECT dict_agg(MCC, Country) AS json 
    FROM csv TO json
  " < codes.csv > code_country.json

$ jq . code_country.json                                        
{
  "289": "Abkhazia",
  "412": "Afghanistan",
  "452": "Vietnam",
  "543": "Wallis and Futuna",
  "421": "Yemen",
  "645": "Zambia"
}

Second, it introduces the kv function to allow loading JSON objects of key-value lookups, emulating left equi joins. Example:

$ cat towers_mini.csv 
radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal
UMTS,262,2,776,165186,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
GSM,262,3,1075,5651,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
UMTS,262,2,801,165123,0,13.295516967773,52.494735717773,1000,1,1,1286864565,1286864565,0
UMTS,262,2,801,165121,0,13.301697,52.499886,1000,4,1,1286884439,1297735807,0
GSM,624,2,6,13883,0,9.704361,4.063911,1588,8,1,1352031870,1374212876,0
GSM,624,2,6,34381,0,9.709009,4.066368,2484,13,1,1352031870,1380389330,0
GSM,452,1,10068,5293,0,105.8031463623,20.959854125977,1000,2,1,1459692342,1488347104,0
GSM,645,3,130,3282,0,28.070755004883,-14.902267456055,1000,1,1,1459743588,1459743588,0
GSM,645,3,1,32221,0,28.252716,-15.439224,1000,2,1,1369353852,1369353852,0
GSM,645,3,1,33932,0,28.255325,-15.436752,1000,5,1,1369353888,1369353888,0
GSM,645,3,1,31662,0,28.258072,-15.434555,1000,5,1,1369353960,1369353960,0

$ spyql "SELECT mcc, kv('code_country.json', mcc) AS country, count_agg(1) AS n_recs FROM csv GROUP BY 1, 2 ORDER BY 1 TO pretty" < towers_mini.csv
  mcc  country      n_recs
-----  ---------  --------
  262                    4
  452  Vietnam           1
  624                    2
  645  Zambia            4

dcmoura avatar Sep 19 '22 23:09 dcmoura