cucumber-jvm
cucumber-jvm copied to clipboard
Single step expression is compiled into regex more than once (thus it makes test execution slower)
It seems that all step expressions are re-compiled into regexes before each scenario.
Steps to reproduce the behavior:
- Given I have feature file like:
Feature: Whatever
Scenario: S1
Given foo
Scenario: S2
Given foo
- and I implement foo step e.g.
@Given("foo")
public void foo() {}
- When I execute the feature file
- Then
io.cucumber.core.stepexpression.StepExpressionFactory.createExpressionmethod is executed twice with "foo" expression (because there are two scenarios) - therefore "foo" is compiled into regex twice
Expected behavior
"foo" is compiled only once.
Context & Motivation
I noticed that if I remove all bodies of my steps from sources it still takes considerable amount of time to run my tests (e.g., 2.2 seconds for 186 scenarios). Using Java Flight Recorder I collected some profiling data and it seems that a lot of time is being spent in io.cucumber.core.stepexpression.StepExpressionFactory.createExpression (47% in my case according to flame graph - see below). When I examined that method in debugger I noticed that it's being called more than once with same expressionString parameter.
Screenshots

Your Environment
- Versions used 6.1.2
- Linux, Gradle, Kotlin
My Java/cucumber-jvm knowledge is pretty low, but I managed to implement a quickfix/hack for this by introducing expression cache in crateExpression method, and it reduced my tests execution time from 2.2s to 1.5s.
Heya thanks for reporting this. A solution may be a bit more complicated then adding a cache. The type registry from which cucumber expressions are compiled is stateful.
It is possible that same expression string results in a different regex. For example the expression {double} will result in different expressions when used in English and French features.Another issue is that different scenario executions may register different parameter types and as a result may generate different regexps from an expression. While not recommend it can't be ruled out. This means that the cache key would be fairly complex and depend on the state of the parameter type registry which is non-trivial and not exposed.
I do have some long term plans to add some immutability to the runner and glue but it needs major change to the way backends discover and register glue. So I don't think this will be fixed any time soon.
Some improvements may also come from implementing a proper parser for cucumber expressions rather the rewriting using regexes (see https://github.com/cucumber/cucumber/pull/771) but that too is a long term project.
Thanks for quick and insightful reply @mpkorstanje. Feel free to close this issue if it's not going to be fixed anytime soon.
PS: Thanks for Cucumber guys! It's pleasure to work with.
Your welcome!
I'll keep the issue around as a reminder to benchmark this in the future.
If it were me,
- Find all calls to
Pattern.compile(String). - Replace with
Patterns.compile(String). - Implement like so:
public static void Patterns {
private static final Map<String, Pattern> store = new ConcurrentHashMap<>();
private Patterns() {}
public static Pattern compile(String pattern) {
return store.computeIfAbsent(pattern, Pattern::compile);
}
public static Pattern compile(String pattern, int flags) {
// ... look at flags, prepend to string pattern as (?x), delegate
// alternatively, have a HashKey class contain both the pattern and the flags
}
}
Once cucumber-java8 is removed Cucumber will only have to create regular expressions once for each language used. That way we can avoid a static map and all the problems that come with that.
I can confirm that all step expressions are recreated at each test scenario (for one of my project with about 400 test scenarios, expressions are recreated about 400 times). Also I understand the issue of caching with respect to the parameter type registry.
Thus, I tested a cached version of my proposed ExpressionFactory improvement https://github.com/cucumber/cucumber-expressions/issues/185 / https://github.com/cucumber/cucumber-expressions/pull/187 . This cached version takes the parameter type registry into account (i.e. Map<ParameterTypeRegistry, Map<String, Expression>>). I measured the duration of my ~400 test scenarios over 10 runs and got on average:
- without caching: 12s 851ms
- with caching: 12s 923ms
I can conclude that adding caching made no significant improvement. Thus, I'm not sure having such caching is the way to go to improve the performance.
For completeness, there is the code with caching:
public final class ExpressionFactory {
private static final Pattern PARAMETER_PATTERN = Pattern.compile("((?:\\\\){0,2})\\{([^}]*)\\}");
private static final Map<ParameterTypeRegistry, Map<String, Expression>> expressionsByRegistry = new HashMap<>();
private final ParameterTypeRegistry parameterTypeRegistry;
private final Map<String, Expression> expressions;
public ExpressionFactory(ParameterTypeRegistry parameterTypeRegistry) {
this.parameterTypeRegistry = parameterTypeRegistry;
expressions = expressionsByRegistry.computeIfAbsent(this.parameterTypeRegistry, key -> new HashMap<>());
}
public Expression createExpression(String expressionString) {
Expression expression = expressions.get(expressionString);
if (expression!=null) {
return expression;
}
int length = expressionString.length();
int lastCharIndex = 0;
char firstChar = 0;
char lastChar = 0;
if (length > 0) {
lastCharIndex = length - 1;
firstChar = expressionString.charAt(0);
lastChar = expressionString.charAt(lastCharIndex);
}
if (firstChar == '^' || lastChar == '$') {
// java regexp => create from regexp
expression = this.createRegularExpressionWithAnchors(expressionString);
} else if (firstChar == '/' && lastChar == '/') {
// script style regexp => create from regexp
expression = new RegularExpression(Pattern.compile(expressionString.substring(1, lastCharIndex)), this.parameterTypeRegistry);
} else {
// otherwise create cucumber style expression
expression = new CucumberExpression(expressionString, this.parameterTypeRegistry);
}
expressions.put(expressionString, expression);
return expression;
}
private RegularExpression createRegularExpressionWithAnchors(String expressionString) {
try {
return new RegularExpression(Pattern.compile(expressionString), this.parameterTypeRegistry);
} catch (PatternSyntaxException var3) {
if (PARAMETER_PATTERN.matcher(expressionString).find()) {
throw new CucumberExpressionException("You cannot use anchors (^ or $) in Cucumber Expressions. Please remove them from " + expressionString, var3);
} else {
throw var3;
}
}
}
}
I can conclude that adding caching made no significant improvement. Thus, I'm not sure having such caching is the way to go to improve the performance.
Currently we have to have to rebuild all Cucumber expressions for each scenario because cucumber-java8 defines step definitions at test execution time (actually, after the test has already started). This means that each scenario execution can contain a different set of cucumber expressions. If/once cucumber-java8 is removed all we can cache expressions by language.
But https://github.com/cucumber/cucumber-jvm/issues/2174 and https://github.com/cucumber/cucumber-jvm/issues/2279 will take a long time.