django-wurfl icon indicating copy to clipboard operation
django-wurfl copied to clipboard

BaseDevice._match_user_agent fails with updated iPad UserAgent

Open vijaykramesh opened this issue 13 years ago • 4 comments

An updated iPad (1) user-agent reports as: Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5

The latest-and-greatest wurfl.xml (http://sourceforge.net/projects/wurfl/files/WURFL/) does not have that exact user-agent; however the official API (Tera-WURFL for the DB implementation, or really any of them) can still match this against the iPad (listed in wurfl.xml as Mozilla/5.0 (iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7D11) because of the way it does partial matches - in the newest Tera-WURFL this can be found in UserAgentMatchers/AppleUserAgentMatcher.php public function recoveryMatch() - notice it doesn't simply try to do a match against the beginning of the string but rather checks for certain key elements.

django-wurfl on the other hand tries to do a match against a third of the UA string and then uses Levenshtein distance to determine which match is the best to use. In this case, a third of the iPad UA string is "Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS" which still doesn't match anything (I was able to get a match by making it fallback to a sixth of the UA string, e.g., "Mozilla/5.0 (iPad; U; C").

The fix is unfortunately non-trivial - the official WURFL APIs solve this by having per-user-agent classes that handle the attempts at matching, instead of just trying to match against a fragment of the UA string. If django-wurfl is to be an actual replacement for the official WURFL APIs, however, this problem needs to be addressed.

vijaykramesh avatar Jun 20 '11 21:06 vijaykramesh

I've submitted a pull request (https://github.com/clement/django-wurfl/pull/5) with at least a partial solution - instead of simply trying to match against 1/3rd of the UA string, it tries to match against 1/3rd, then if there are no matches, 1/4th, then if none 1/5th, etc - with some sanity checks (if the string becomes less than 5 characters it breaks without matching, and as soon as it gets a match (or matches) it breaks without trying against even shorter versions of the UA string).

This doesn't have the robust type-specific checks that Tera-Wurfl does, but it at least enables us to match against an updated iPad

vijaykramesh avatar Jun 21 '11 14:06 vijaykramesh

Yes, I've been planning to write an advanced user-agent matching for a while now, even though I'm not sure I want to follow the way TeraWURFL implement it.

I will take a shot at it this weekend, to see if I can get something working.

clement avatar Jun 21 '11 15:06 clement

Cool, let me know if you need another pair of eyes or hands on it.

I definitely am not a fan of the way they are doing it in TeraWURFL, as it requires maintaining not just your wurfl.xml but also your user-agent parsing - i.e., any time a new or updated device has a somewhat different user-agent string that is in the same "vein" as others in its brand or product line, you'd have to edit the user-agent parsing to add the specifics. If you're already going to have that data in wurfl.xml, why duplicate it in your code, you know?

On a side note, I have no idea why the latest wurfl.xml doesn't include the current iPad 1 user-agent. It (the xml) was released 6/11/11, so it seems odd to me that it (the iPad) would just be missing from the list... I guess they are assuming that the api will be doing complex type-specific user-agent matching so there is no need to have every possible user agent in the database?

vijaykramesh avatar Jun 21 '11 15:06 vijaykramesh

Any progress on this item? This project seems kind of dead. Hopefully not, because there's not another good wurfl processing library for django I know of, and the DeviceAtlas stuff all costs money. :/

rbdcti avatar Jan 10 '12 22:01 rbdcti