uap-core icon indicating copy to clipboard operation
uap-core copied to clipboard

Get real data set for performance testing

Open tobie opened this issue 10 years ago • 7 comments

tobie avatar Nov 29 '14 21:11 tobie

I think I'll be able to provide you with a data set. Give me a little time I'll have to prepare the data set and then double check it can be released to the public.

mspiegel avatar Jan 12 '15 21:01 mspiegel

Very cool @mspiegel , looking forward to seeing that.

russellwhitaker avatar Jan 30 '15 18:01 russellwhitaker

Here's a link to a line-delimited file containing ~100k User-Agents we've seen at Bitly. There's no significance to the position of a particular UA string within the file - it's been shuffled.

We're totally cool with these being incorporated into the project!

davemarchevsky avatar Apr 28 '15 18:04 davemarchevsky

Here is a file of 200k user agents or so from yesterday on an ad network https://s3.amazonaws.com/cmds/useragents.txt.gz

patmmccann avatar Jul 23 '15 17:07 patmmccann

This is a list of 384177 unique mobile UAs I've encountered of the last years: http://whichbrowser.net/data/useragents.txt

And here is more data: http://whichbrowser.net/data/index.html

NielsLeenheer avatar Nov 05 '15 11:11 NielsLeenheer

Thanks a lot for providing this data. Unfortunately, I don't have the cycles right now to do anything with those. So unless there's someone that would be interested in covering the cost of doing this research, I suggest whoever can commit just runs off with it and does it.

tobie avatar Nov 06 '15 12:11 tobie

We have compiled a list of ~600K UA's we have seen go through our system.

Following are 2 files ua.txt -> Contains the raw string for user agents ua_parsed.csv -> Parsed with ua-python (There is an extra (last) column called error that was used internally, but can be ignored for testing purposes).

https://storage.googleapis.com/xpqv/ua.txt https://storage.googleapis.com/xpqv/ua_flat.csv

If anyone is interested at further exploring this dataset, I did export the parsed data into BigQuery so that we could gather internal statistics, but happy to share this dataset with anyone to explore the results further.

PS ~360K are Facebook Browser UA (seems they create variations in their user agent based on multiple factors even for the same version of the FB Bowser). PPS This was done with a latest installation of UAP

shashank- avatar Oct 21 '16 01:10 shashank-