[BUG] v4.1 and v4 hangs on Coverage and never completes
Checklist
- [x] I checked the FAQ section of the documentation
- [x] I looked for similar issues in the issue tracker
- [x] I am using the latest version of Schemathesis
Describe the bug
v4.1 and v4.0.26 Schemathesis both hangs at Coverage and never completes. v3.39.16 completes the tests for my private api fine without issues.
I have an auth hook to authenticate OAuth2 and retrieve a token for accessing the API. I've used Claude AI to help analyze my code for compatibility with v4 and it seems to look fine.
As I'm working on a private api, I can't paste it here. I've read you removed the verbose CLI flag as it didn't do anything. How do I diagnose the cause? Or can you add a verbose flag in the future to show why Schemathesis maybe hanging in real time?
Hi! Right now there are no feasible way to debug it from the outside :(
However, I suspect that the issue could come from the “pattern” keyword when it is combined with maxLength or minLength that are e.g in hundreds. Some regexes require additional filtering of the generated data, so when regex uses backtracking it could be slow , especially on large strings, which can be produced by such minLengh/maxLength combos, especially when they check the upper bound. By any chance, do you have such keywords combos?
I can check the OpenApi / Swagger json file for these keywords. I noticed when the test first starts up, the timer updates frequently. As the test continues, it takes longer and longer for the timer to update. Using Ctrl-C to breakout of execution doesn't work. It's still stuck in execution loop. Not sure if thats enough of a clue.
Is it possible to add a future feature for the scenario: if the code detects an endpoint has take longer than a minute to test, it'll return to the command prompt and display a log of what it has done up to that point? That may help figuring out why v4 and v4.1 are getting stuck.
For now I'll have to stick with v3.39 which works great for the internal APIs.
@Stranger6667 I did a search in my json file and did not find any maxLength or minLength syntax.
@j7an and what about "pattern"? This is the main reason why certain schemas are skipped in the test corpus in this repo. From your description, it is my main assumption - another one would be recursive $ref, but it would affect the fuzzing phase as well.
Is it possible to add a future feature for the scenario: if the code detects an endpoint has take longer than a minute to test, it'll return to the command prompt and display a log of what it has done up to that point? That may help figuring out why v4 and v4.1 are getting stuck.
It would be a useful feature indeed! What is suspicious for me is that the timer freezes :(
Hard to say without the schema - if it helps, I can sign an NDA and review the root cause privately
For now I'll have to stick with v3.39 which works great for the internal APIs.
Do you also use the coverage phase on v3? It was behind a feature flag EXPERIMENTAL_COVERAGE_PHASE env var. If you didn't use the coverage phase, you can try v4 with --phases=examples,fuzzing,stateful, so it will disable the coverage phase
@Stranger6667 there's no "pattern", but there's extensive use of $ref in the json (about 370 in my search). I did not use EXPERIMENTAL_COVERAGE_PHASE flag in v3.39. But I'll try disabling coverage using the phase flag in v4.1 and see if that helps.
Also, the OpenAPI specification the json file is using is v2.0 if that helps. Not sure how much backward compatibility v4.1 has on OpenAPI v2.0.
Thanks.
@Stranger6667 I used Augment to do an analysis of my Swagger.json file and Schemathesis. This is what it says is the reason Schemathesis is freezing.
Detailed Technical Analysis: Schemathesis Coverage Phase Freeze
Based on my analysis of the problematic swagger.json file and the Schemathesis source code, I can provide you with a comprehensive technical analysis of why the coverage phase freezes.
1. Root Cause: Combinatorial Explosion
The coverage phase freezes due to a combinatorial explosion caused by the endpoint having 54 parameters. Here's the breakdown:
Parameter Analysis:
- 40 string parameters (no format constraints)
- 4 string parameters with
date-timeformat - 3 string parameters with
uuidformat - 2 array parameters with
uuidformat - 4 integer parameters
- 1 boolean parameter
Coverage Generation Per Parameter: Each parameter generates approximately 6 test values:
- 1-2 positive values (valid examples)
- 4-5 negative values (boundary violations, type mismatches, format violations)
Mathematical Problem: The coverage phase attempts to generate test cases for each parameter individually, resulting in:
- 54 parameters × ~6 values each = ~324 individual parameter test cases
- But the real issue is the sequential processing and memory overhead of managing all these generators simultaneously
2. Specific Problematic Code Sections
def _iter_coverage_cases(...):
generators: dict[tuple[str, str], Generator[coverage.GeneratedValue, None, None]] = {}
# Creates a generator for EACH of the 54 parameters
for parameter in operation.iter_parameters():
gen = coverage.cover_schema_iter(...)
generators[(location, name)] = gen
# Then iterates through ALL values for EACH parameter
for (location, name), gen in generators.items():
iterator = iter(gen)
while True: # This is the infinite loop!
try:
value = next(iterator)
# Generate test case for this parameter value
except StopIteration:
break
3. Why These Schema Patterns Trigger the Problem
Problematic Schema Elements:
-
UUID Format Parameters (5 total):
{ "type": "string", "format": "uuid" }- Generates: valid UUID + 5 negative values (wrong type, wrong format, etc.)
-
Date-Time Format Parameters (4 total):
{ "type": "string", "format": "date-time" }- Generates: valid datetime + 5 negative values
-
Array Parameters with UUID Items (2 total):
{ "type": "array", "format": "uuid", "items": {"type": "string"}, "collectionFormat": "multi" }- Generates: valid array + multiple negative arrays
-
Unconstrained String Parameters (40 total):
{ "type": "string" }- Each generates: empty string, boundary lengths, invalid types
4. Exact Schemathesis Code Causing the Freeze
The freeze occurs in the coverage value generation loop in coverage.py:
def cover_schema_iter(ctx: CoverageContext, schema: dict | bool, seen: HashSet | None = None):
# For each constraint in the schema, generate boundary values
for key, value in schema.items():
if key == "format":
yield from _negative_format(ctx, schema, value)
elif key == "type":
yield from _negative_type(ctx, value, seen, schema)
# ... many more constraint types
The issue is that with 54 parameters, the system creates 54 separate generators that each produce multiple values, and the main loop processes them sequentially without any limits or timeouts.
5. Memory and Performance Impact
Memory Issues:
- 54 active generators held in memory simultaneously
- Each generator maintains state for boundary value calculation
- HashSet deduplication for each parameter grows over time
CPU Issues:
- Regex compilation for pattern validation (repeated for each value)
- JSON Schema validation for each generated value
- String format validation (UUID, date-time) for each negative test case
6. Potential Workarounds and Fixes
Immediate Workarounds:
-
Reduce Parameter Count:
# Test with fewer parameters by filtering the schema uv run st run --phases coverage swagger.json --url http://localhost:8080 --max-examples 10 -
Use Only Positive Mode:
# Skip negative value generation uv run st run --phases examples,fuzzing swagger.json --url http://localhost:8080 -
Limit Coverage Scope:
# Test specific endpoints only uv run st run --phases coverage swagger.json --url http://localhost:8080 --endpoint-filter="/model/models/{model_id}"
Potential Code Fixes:
- Add Parameter Limit: Modify
_iter_coverage_casesto limit the number of parameters processed - Add Timeout: Implement per-parameter generation timeout
- Batch Processing: Process parameters in smaller batches instead of all at once
- Smart Sampling: Instead of exhaustive coverage, use statistical sampling for large parameter sets
7. Conclusion
The Schemathesis coverage phase freezes because it attempts exhaustive boundary value testing on an API endpoint with 54 parameters. The current implementation has no safeguards against this combinatorial explosion, making it unsuitable for APIs with large numbers of parameters.
The issue is not a bug but rather a design limitation where the coverage phase prioritizes completeness over performance, making it impractical for complex APIs like the API in your swagger.json file.
Recommendation: For APIs with many parameters, use the examples and fuzzing phases instead of coverage, or consider splitting the API into smaller, more focused endpoints.
@j7an Thanks!
Yes, the number of parameters is indeed the root cause. There was a PR that made this generation work on demand, but I've closed it as there were no significant changes in memory consumption in my benchmarks. I will revisit it with a schema that has a larger number of parameters. Right now, the generated values are first generated and then used.
I think there is even a case in the test corpus with such a large number of parameters, but I didn't add it to my to-revisit list, and now it is just skipped, similar to pathological "pattern" cases.
Other notes in the memory & performance section don't have such an effect on the overall process. The core issue is combinatorial explosion.
To mitigate this, I think the following would be worthwhile to implement:
- Use
max_examplesin the coverage phase to limit the total number of cases. Though, not sure if the current default value is sufficient (100 cases). - Avoid storing test cases before usage - they should be generated on demand
- Maybe estimate the number of test cases upfront and avoid generating too many cases (e.g., run fewer iterations in some inner loops)
- Improve deduplication & reuse some generated values
Is it correct that the schema has 40 unconstrained string parameters? I think with the parameter description above, it should be easy for me to reproduce the issue.
I'll check how much generation time such a schema (or its reasonably smaller but still slow version) will take and will work from there.
Thanks again for sharing this
Yes, I believe 40 unconstrained string parameters is the correct number. Thanks for looking into it.