NuGetGallery icon indicating copy to clipboard operation
NuGetGallery copied to clipboard

[Statistics] Many user-agents are marked as unknown

Open skofman1 opened this issue 8 years ago • 3 comments

We need to analyze which user agents are not identified, and update the list.

skofman1 avatar Jan 05 '18 21:01 skofman1

This commit shows how to add a user-agent to the list.

loic-sharma avatar Jan 09 '18 00:01 loic-sharma

This commit shows how to add a user-agent to the list.

For anyone else who wanders into this thread in the future, this YAML file seems to have been moved to the NuGetGallery repo.

And for anyone else confused about why they see clients listed on NuGet.org that aren't represented in this YAML, the script that consumes this YAML has a fallback that uses a generic list of well-known user agents.

(I was almost going to suggest that updating that list by updating ua-parser might help resolve this issue, but it turns out that list has not been updated in a very long time other than a handful of unreleased changes.)

PathogenDavid avatar Apr 02 '25 18:04 PathogenDavid

Thought I had after leaving that comment: Could all the unknown clients be from the Chinese CDN? I don't fully understand its purpose, but the user agent parser has logic which rewrites the regular expressions in the YAML file to use + in place of spaces:

https://github.com/NuGet/NuGetGallery/blob/4b37d4d6bba949d81768f914bf99ea14e31168db/python/StatsLogParser/loginterpretation/useragentparser.py#L34-L46

However no similar logic exists for the previously mentioned fallback. Presumably this is done because the logs from the Chinese CDN are in a different format. I don't know if the statistics include both CDNs or if the Chinese gallery is isolated from the main one, but it might explain the high number of unknown UAs on some packages.

(I don't have a horse in this race, just thought I'd share what I noticed. I was just looking into where the names on the stats page come from.)

PathogenDavid avatar Apr 02 '25 18:04 PathogenDavid