OpenFermion icon indicating copy to clipboard operation
OpenFermion copied to clipboard

Fix #1119: avoid unsafe approach to deserializing data

Open mhucka opened this issue 3 months ago • 0 comments

CodeQL security scan report #567 flagged a data loading operation in src/openfermion/utils/operator_utils.py as being usafe due because it uses a user-provided value. The warning is about lines 282-283, involving the code

with open(file_path, 'rb') as f:
    data = marshal.load(f)

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code.

The problem here is that loading serialized Python objects using marshal is inherently unsafe, because the format is not designed to be secure against malicious or corrupted data. Unfortunately, we probably can't simply switch to a safer serialization method (e.g., JSON or protobufs) because users may already have saved files in this format. We should maintain backward compatibility with people's existing files.

This PR adds more checks on the data read by load_operator(), including checks on maximum file size.

Note: the maximum file size values are hardwired as constants near the top of this file. I don't have much evidence for what would be reasonable max values, so made some guesses based on other files found in the repository. Someone with longer experiencing using OpenFermion should double-check the values.

mhucka avatar Sep 24 '25 23:09 mhucka