uap-core
uap-core copied to clipboard
Get real data set for performance testing
I think I'll be able to provide you with a data set. Give me a little time I'll have to prepare the data set and then double check it can be released to the public.
Very cool @mspiegel , looking forward to seeing that.
Here's a link to a line-delimited file containing ~100k User-Agents we've seen at Bitly. There's no significance to the position of a particular UA string within the file - it's been shuffled.
We're totally cool with these being incorporated into the project!
Here is a file of 200k user agents or so from yesterday on an ad network https://s3.amazonaws.com/cmds/useragents.txt.gz
This is a list of 384177 unique mobile UAs I've encountered of the last years: http://whichbrowser.net/data/useragents.txt
And here is more data: http://whichbrowser.net/data/index.html
Thanks a lot for providing this data. Unfortunately, I don't have the cycles right now to do anything with those. So unless there's someone that would be interested in covering the cost of doing this research, I suggest whoever can commit just runs off with it and does it.
We have compiled a list of ~600K UA's we have seen go through our system.
Following are 2 files ua.txt -> Contains the raw string for user agents ua_parsed.csv -> Parsed with ua-python (There is an extra (last) column called error that was used internally, but can be ignored for testing purposes).
https://storage.googleapis.com/xpqv/ua.txt https://storage.googleapis.com/xpqv/ua_flat.csv
If anyone is interested at further exploring this dataset, I did export the parsed data into BigQuery so that we could gather internal statistics, but happy to share this dataset with anyone to explore the results further.
PS ~360K are Facebook Browser UA (seems they create variations in their user agent based on multiple factors even for the same version of the FB Bowser). PPS This was done with a latest installation of UAP