django-wurfl
django-wurfl copied to clipboard
BaseDevice._match_user_agent fails with updated iPad UserAgent
An updated iPad (1) user-agent reports as: Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5
The latest-and-greatest wurfl.xml (http://sourceforge.net/projects/wurfl/files/WURFL/) does not have that exact user-agent; however the official API (Tera-WURFL for the DB implementation, or really any of them) can still match this against the iPad (listed in wurfl.xml as Mozilla/5.0 (iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7D11) because of the way it does partial matches - in the newest Tera-WURFL this can be found in UserAgentMatchers/AppleUserAgentMatcher.php public function recoveryMatch() - notice it doesn't simply try to do a match against the beginning of the string but rather checks for certain key elements.
django-wurfl on the other hand tries to do a match against a third of the UA string and then uses Levenshtein distance to determine which match is the best to use. In this case, a third of the iPad UA string is "Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS" which still doesn't match anything (I was able to get a match by making it fallback to a sixth of the UA string, e.g., "Mozilla/5.0 (iPad; U; C").
The fix is unfortunately non-trivial - the official WURFL APIs solve this by having per-user-agent classes that handle the attempts at matching, instead of just trying to match against a fragment of the UA string. If django-wurfl is to be an actual replacement for the official WURFL APIs, however, this problem needs to be addressed.
I've submitted a pull request (https://github.com/clement/django-wurfl/pull/5) with at least a partial solution - instead of simply trying to match against 1/3rd of the UA string, it tries to match against 1/3rd, then if there are no matches, 1/4th, then if none 1/5th, etc - with some sanity checks (if the string becomes less than 5 characters it breaks without matching, and as soon as it gets a match (or matches) it breaks without trying against even shorter versions of the UA string).
This doesn't have the robust type-specific checks that Tera-Wurfl does, but it at least enables us to match against an updated iPad
Yes, I've been planning to write an advanced user-agent matching for a while now, even though I'm not sure I want to follow the way TeraWURFL implement it.
I will take a shot at it this weekend, to see if I can get something working.
Cool, let me know if you need another pair of eyes or hands on it.
I definitely am not a fan of the way they are doing it in TeraWURFL, as it requires maintaining not just your wurfl.xml but also your user-agent parsing - i.e., any time a new or updated device has a somewhat different user-agent string that is in the same "vein" as others in its brand or product line, you'd have to edit the user-agent parsing to add the specifics. If you're already going to have that data in wurfl.xml, why duplicate it in your code, you know?
On a side note, I have no idea why the latest wurfl.xml doesn't include the current iPad 1 user-agent. It (the xml) was released 6/11/11, so it seems odd to me that it (the iPad) would just be missing from the list... I guess they are assuming that the api will be doing complex type-specific user-agent matching so there is no need to have every possible user agent in the database?
Any progress on this item? This project seems kind of dead. Hopefully not, because there's not another good wurfl processing library for django I know of, and the DeviceAtlas stuff all costs money. :/