uadetector Performance improvements

Performance improvements

Open PavelCibulka opened this issue 9 years ago • 1 comments

I've examined, why user agent parsing is slow. Here are some tips:

This could be done just with HashMap<String, Robot>. Note no regexp here. AbstractUserAgentStringParser.examineAsBrowser() for (final Robot robot : data.getRobots()) { if (robot.getUserAgentString().equals(builder.getUserAgentString())) {

Lazy OS detection. OS is not always needed. Lazy Device detection. Same here. Device is not always needed.

Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea: We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example: EnumTest1: User agent starts with string "Mozilla" If this return false, don't test any rexep that start with /^Mozilla

EnumTest2: User agent starts with string "M" If this return true, don't test any regex starting with /^ but not starting with /^M

There are 631 <browser_reg>, 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.

There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example: /mozilla._AppleWebKit._NetFrontLifeBrowser/([0-9.]+)/si requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser test: if ( hashmap.containsAll( requiredWords ) ) This would need probable new field for required words in uasdata.

Regards, Pavel

Apr 09 '15 08:04 PavelCibulka

@PavelCibulka sounds good. Would you do a Pull Request that prototypes your proposed changes?

Apr 11 '15 08:04 arouel

uadetector uadetector copied to clipboard

Performance improvements

uadetector
uadetector copied to clipboard