PASS icon indicating copy to clipboard operation
PASS copied to clipboard

URLs file is messy

Open prabhant opened this issue 1 year ago • 1 comments

Hi, Few issues I found with the URL file https://thor.robots.ox.ac.uk/datasets/pass//pass_urls.txt

the length of the file is 2879648 The length of unique URLs is 1440060 length of unique hashes in pass_metadata.csv is 1439588

len(pass_meta['hash'].unique())
1439588
len(set(url_arr))
1440060
len(url_arr)
2879648

Is URL file not a 1 on 1 mapping of pass_metadata.csv?

prabhant avatar May 11 '23 14:05 prabhant