pymatgen icon indicating copy to clipboard operation
pymatgen copied to clipboard

CVE-2022-42964 ReDOS vulnerability in GaussianInput

Open drew-parsons opened this issue 2 years ago • 3 comments

Describe the bug A CVE-2022-42964 ReDOS vulnerability in GaussianInput has been reported in the GaussianInput.from_string method.

An exponential ReDoS (Regular Expression Denial of Service) can be triggered in the pymatgen PyPI package, when an attacker is able to supply arbitrary input to the GaussianInput.from_string method

The report was made at https://research.jfrog.com/vulnerabilities/pymatgen-redos-xray-257184/ and documented by Debian at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024017 (see also https://security-tracker.debian.org/tracker/CVE-2022-42964 )

To Reproduce Steps to reproduce the behavior:

  1. Use test code CVE-2022-42964.py
import time
from pymatgen.io.gaussian import GaussianInput

def str_and_from_string(i):
    ans = """#P HF/6-31G(d) SCF=Tight SP

H4 C1

0 1
"""
    vulnerable_input = ans + 'C'+'0' * i + '!'+'\n'
    GaussianInput.from_string(vulnerable_input)

for i in range(1000):
    start = time.time()
    str_and_from_string(i)
    print(f"{i}: Done in {time.time() - start}")
  1. python3 CVE-2022-42964.py
  2. Output shows exponentially growing execution time for what should be a trivial constant loop

Expected behavior Creating strings of the kind in this example should require the same millisecond time in each iteration.

Screenshots

$ python3 CVE-2022-42964.py
0: Done in 0.0006997585296630859
1: Done in 4.506111145019531e-05
2: Done in 3.814697265625e-05
3: Done in 4.291534423828125e-05
4: Done in 5.6743621826171875e-05
5: Done in 6.365776062011719e-05
6: Done in 5.555152893066406e-05
7: Done in 7.033348083496094e-05
8: Done in 0.00010371208190917969
9: Done in 0.00017571449279785156
10: Done in 0.0003418922424316406
11: Done in 0.0006191730499267578
12: Done in 0.0012633800506591797
13: Done in 0.002537250518798828
14: Done in 0.005010366439819336
15: Done in 0.009590387344360352
16: Done in 0.01953911781311035
17: Done in 0.03992795944213867
18: Done in 0.07311630249023438
19: Done in 0.13045120239257812
20: Done in 0.2530491352081299
21: Done in 0.5362303256988525
22: Done in 1.0537843704223633
23: Done in 2.012873888015747
24: Done in 4.074865102767944
25: Done in 8.38607931137085
26: Done in 17.248133182525635
27: Done in 38.30663585662842
28: Done in 79.40008401870728
...

Desktop:

  • OS: Debian Linux
  • Linux Version 6.0.8-1 (debian unstable)

Additional context Python 3.10.8 pymatgen 2022.11.7

drew-parsons avatar Nov 28 '22 11:11 drew-parsons

Looks like this stems from ^(\w+)* in https://github.com/materialsproject/pymatgen/blob/7a51c9b3e993ba5b6cc57bbdf5c293feb045611e/pymatgen/io/gaussian.py#L93

which you can explode with "0" * 100 + "!".

I'm not a Gaussian user, but is this part of the regex necessary? Can the input be satisfied with ^(\w)*...?

ScottNotFound avatar Nov 28 '22 20:11 ScottNotFound

@ScottNotFound I don't think it is possible to change ^(\w+)* to ^(\w)*.

This regex is for example looking for lines such as

C1
C2  1   CC
H3  1   CH1  2  asp2
H4  1   CH1  2  asp2  3  180.
H5  2   CH2  1  asp3  3  D1
H6  2   CH2  1  asp3  5  D2

The line starts by an element symbol which could be (optionally) followed by a number. If only chemical elements were valid items here, the Regex could had been [a-zA-Z]{1,2}. But, as for the above example you ca write C1 or C.

Using the following regex line 93 for the class attribute _zmat_patt should be ok. The tests in test_gaussian.py are valid.

_zmat_patt = re.compile(r"^(\w+)([\s,]+(\w+)[\s,]+(\w+)){0,3}[\-\.\s,\w]*$")

It looks like it fixes the vulnerability:

1: Done in 0.0005118846893310547
21: Done in 6.198883056640625e-05
41: Done in 5.817413330078125e-05
61: Done in 7.319450378417969e-05
81: Done in 9.799003601074219e-05
101: Done in 0.00012993812561035156
121: Done in 0.0001690387725830078
141: Done in 0.00022125244140625
161: Done in 0.00026917457580566406
181: Done in 0.00033211708068847656
201: Done in 0.0004401206970214844
221: Done in 0.0005009174346923828
241: Done in 0.0005931854248046875
261: Done in 0.0006520748138427734
281: Done in 0.0007491111755371094
301: Done in 0.0008258819580078125
321: Done in 0.00090789794921875
341: Done in 0.001276254653930664
361: Done in 0.0011610984802246094
381: Done in 0.0013689994812011719
401: Done in 0.0014448165893554688
421: Done in 0.001538991928100586
441: Done in 0.0017888545989990234
461: Done in 0.0019121170043945312
481: Done in 0.0019729137420654297
501: Done in 0.002460002899169922
521: Done in 0.0028901100158691406
541: Done in 0.002777099609375
561: Done in 0.0030379295349121094
581: Done in 0.0029418468475341797
601: Done in 0.0033109188079833984
621: Done in 0.003364086151123047
641: Done in 0.0034112930297851562
661: Done in 0.003716707229614258
681: Done in 0.0038809776306152344
701: Done in 0.004051923751831055
721: Done in 0.004559993743896484
741: Done in 0.004487037658691406
761: Done in 0.0049211978912353516
781: Done in 0.00495600700378418
801: Done in 0.00517582893371582
821: Done in 0.005485057830810547
841: Done in 0.005793094635009766
861: Done in 0.005961894989013672
881: Done in 0.007217884063720703
901: Done in 0.006846904754638672
921: Done in 0.007014751434326172
941: Done in 0.00799417495727539
961: Done in 0.007646322250366211
981: Done in 0.008347272872924805
1001: Done in 0.008255958557128906

gVallverdu avatar Mar 23 '23 14:03 gVallverdu

@gVallverdu FYI, debian has been including your proposed fix since June 2023: https://salsa.debian.org/debichem-team/pymatgen/-/commit/dcba4226dfc59789070bd1f7aa40b953e7722651

stevebeattie avatar Jul 17 '24 17:07 stevebeattie