hypothesis Major performance regression between Hypothesis 6.114.1 and 6.115.0

Hi,

The initial problem was reported here: https://github.com/schemathesis/schemathesis/issues/2507, with some guesses in the last comments.

With Hypothesis 6.114.1, the run of schemathesis on the WeeChat API takes 43 seconds. With Hypothesis 6.115.0 and the same schemathesis / WeeChat versions, it takes 40-50 minutes.

I don't know how to troubleshoot this, let me know if you need more information or tests on my side.

Thanks!

Oct 14 '24 14:10 flashcode

Hmm. I'm quite suprised that https://github.com/HypothesisWorks/hypothesis/pull/4133/ could have this effect; it's a pretty minimal change to our pretty-printer. Can you share a self-contained test script which shows this slowdown?

Oct 14 '24 19:10 Zac-HD

It's possible we missed a case in #4063 and are repr'ing the entirety of an api schema somewhere, which could be quite expensive indeed. A 50x slowdown is pretty wild though.

Oct 14 '24 19:10 Liam-DeVoe

Sorry, I don't know hypothesis enough to be able to share a self-contained test script which shows this slowdown, but I can easily reproduce the issue via schemathesis, as described in the linked issue.

Oct 20 '24 06:10 flashcode

@Stranger6667 I see you have a workaround in https://github.com/schemathesis/schemathesis/pull/2514, do you have any ideas for where we might have missed something in #4063?

Oct 20 '24 17:10 Zac-HD

Hi Zac,

I do! :) Here. bits is used only in except branches but uses nicerepr unconditionally.

In this case, maybe, call repr_call only if xfail_example_reprs are not empty?

To find it I've run the following script under py-spy (at this branch)

from schemathesis import cli

cli.run(["https://example.schemathesis.io/openapi.json"])

It includes cases when pretty printing is immediately needed, but also when pretty-printed reprs are unused.

Flamegraph

py-spy record -o profile.svg -- python pretty_print.py

profile

In Schemathesis test cases have references to API endpoints and those have references to the API schema to provide a nicer API to the end-user in its pytest integration (e.g. case.call and others). The regular __repr__ does not include those, but pretty.py does its own introspection, so it goes over these references and could be pretty large. Right now, I think the easiest way to resolve the issue is defining custom _repr_pretty_ on the Schemathesis side as it seems to me that this use-case is pretty specific to the current Schemathesis internal structure.

But a more general issue is that reprs could be arbitrarily large and it may give a sensible performance hit. There are some safeguards as far as I remember, but I am not sure if they also could be applied in these cases.

Oct 22 '24 06:10 Stranger6667

Hi,

Thanks for your help, the performance issue is now solved! But I'm facing another issue with version 6.115.6, I opened a separate issue: #4151.

Oct 30 '24 08:10 flashcode