crux-top-lists
crux-top-lists copied to clipboard
`202205.csv` contains only 859188 records instead of 1M
Hello,
Thank you for maintaining this repository and cached versions of crux-top-list.
202205.csv
contains only 859188 records instead of the usual 1M. Can the corresponding list be regenerated and updated here or is the data also missing from Google's BigQuery database?
>>> import pandas as pd
>>> df = pd.read_csv("202205.csv")
>>> df
origin rank
0 http://iporntv.net 1000
1 https://eldenring.wiki.fextralife.com 1000
2 https://m.lightinthebox.com 1000
3 https://ssc.nic.in 1000
4 https://ja.m.wikipedia.org 1000
... ... ...
859183 https://www.vulcaodaborracha.com.br 1000000
859184 https://www.vub.be 1000000
859185 https://www.virginianaturalgas.com 1000000
859186 https://www.virtualregatta.com 1000000
859187 https://zamosc.lento.pl 1000000
[859188 rows x 2 columns]
>>> df.groupby("rank").nunique()
origin
rank
1000 904
10000 7806
100000 76566
1000000 773912
Thanks!