django-rest-framework
django-rest-framework copied to clipboard
Speed up ProhibitSurrogateCharactersValidator
Description
I've noticed that this validator is using a per-character loop. Replacing it with a regex results in a pretty significant speedup. Here are results from my benchmark:
| String length | Old implementation time (sec) | New implementation time (sec) |
|---|---|---|
| 1 | 2.833e-07 | 1.765e-07 |
| 10 | 5.885e-07 | 2.030e-07 |
| 100 | 3.598e-06 | 4.144e-07 |
| 1000 | 3.329e-05 | 2.463e-06 |
| 10000 | 0.0003338 | 2.449e-05 |
| 100000 | 0.003338 | 0.0002284 |
| 1000000 | 0.03333 | 0.002278 |
| 10000000 | 0.3389 | 0.02377 |
| 100000000 | 3.250 | 0.2365 |
For large strings, the speedups are more than an order of magnitude.
For the record, here's the benchmark:
import timeit
from rest_framework.validators import ProhibitSurrogateCharactersValidator
validator = ProhibitSurrogateCharactersValidator()
for i in range(0, 9):
length = 10 ** i
string = "a" * length
timer = timeit.Timer("validator(string)", globals=globals())
loops, duration = timer.autorange()
print(f"{length}\t{duration / loops}")
Okay, so this is a good exercise case...
It's a neat and properly defined little PR that's clearly an improvement. We'd need to be wilfully obtuse to say no. Why might we choose to reject even this properly targeted improvement?
Well... because having a steadfast "no" policy is really uncomplicated, and means we don't have continual PR pressure & creep, which on balance just results in unnecessary busy-work and risk.
I'd suggest we need to be more clear in our language in CONTRIBUTING.md, and just issue a really clear no on essentially all code PRs here. If we're unambiguous about this then we don't have to second guess ourselves.
Suggested language might. be...
REST framework is considered complete. We may accept...
- Pull requests improving the documentation, or adding third party packages to the documentation.
- Pull requests that resolve a CVE'd security issue.
- Pull requests that are strictly required if updated Django versions don't pass some part of our existing test suite.
All other pull requests and issues will be closed.
Yes, there's clearly areas where there's potential performance impacts. There's also plenty of areas where there's loosely defined behaviour. However being really clear that this is essentially a finished project (excepting the points above) would probably be a net benefit.
(The alternative is similar to above, but we allow ourselves some leeway with a "We'll only consider pull requests that meet an unambiguously high bar of quality". 🤔)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I have rebased this in case you change your mind. If not, just close the PR.
FWIW, the current wording in CONTRIBUTING.md and elsewhere says that
We may accept pull requests that track the continued development of Django versions, but would prefer not to accept new features or code formatting changes.
Since this is neither a new feature nor a code formatting change, it seems like it should be in scope. Of course, it's your project, so I understand if you want to reject it anyway.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.