OmegaConf is vulenrable to ddos via specially crafted yaml input (billion laughs attack)
OmegaConf is vulnerable to Billion laughs attack. Specially crafted yaml input files can use yaml anchors to create a yaml input that is fast to load but very slow to traverse because it contains billions of nodes.
This is considered medium priority and will be addressed in a future major version of OmegaConf. Most potential fixes are breaking.
Details
pyyaml is creating references to the same object when it loads an anchor, which is why it can do load such files quickly.
Example input:
lol1: &lol1 "lol"
lol2: &lol2 [*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1]
lol3: &lol3 [*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2]
lol4: &lol4 [*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3]
lol5: &lol5 [*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4]
lol6: &lol6 [*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5]
lol7: &lol7 [*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6]
lol8: &lol8 [*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7]
lol9: &lol9 [*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8]
lol10: &lol10 [*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9]
Such input creates billions of yaml nodes. OmegaConf will take an unreasonably long time to instantiate an object from it.
User-land solutions
1. Use interpolations instead of yaml anchors
Avoid yaml anchors and use interpolations instead.
The following config uses interpolations to achieve a similar behavior:
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]
lol3: ["${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}","${lol2}"]
lol4: ["${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}","${lol3}"]
lol5: ["${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}","${lol4}"]
lol6: ["${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}","${lol5}"]
lol7: ["${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}","${lol6}"]
lol8: ["${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}","${lol7}"]
lol9: ["${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}","${lol8}"]
lol10: ["${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}","${lol9}"]
Note that the config is still too big for actual traversal but since it's evaluated lazily it's possible to use it if it's done carefully.
2. Validate object created by pyyaml
If the config cannot be trusted, a check can be made before instantiating OmegaConf from a dict (instead of from a file) with a function similar to this:
def check_no_refs(obj, memo=None):
if memo is None:
memo = set()
if isinstance(obj, dict):
c = 0
for k, v in obj.items():
c = c + check_no_refs(v, memo)
return c
if isinstance(obj, list):
c = 0
for v in obj:
c = c + check_no_refs(v, memo)
return c
else:
idv = id(obj)
if idv in memo:
raise RuntimeError("duplicate item added")
memo.add(idv)
return 1
with io.open(os.path.abspath("1.yaml"), "r", encoding="utf-8") as f:
obj = yaml.load(f)
check_no_refs(obj)
# Create cfg after checking dict returned by pyyaml
cfg = OmegaConf.create(obj)
Possible future OmegaConf solutions
The following are possible strategies to handle it inside OmegaConf.
- Deprecate yaml anchors in favor of interpolations (detect usage of anchors and issue a warning telling the user to use interpolations instead
- Automatically convert detected anchors to interpolation, this is potentially tricky
- Limit config loaded to yaml files/strings to a maximum number of nodes and issue an error (10k or even 50k nodes seems like plenty).
- Provide an API allowing safe usage by rejecting all anchors. default behavior is TBD.
Possible future OmegaConf solutions
The following are possible strategies to handle it inside OmegaConf.
- Deprecate yaml anchors in favor of interpolations (detect usage of anchors and issue a warning telling the user to use interpolations instead
- Automatically convert detected anchors to interpolation, this is potentially tricky
- Limit config loaded to yaml files/strings to a maximum number of nodes and issue an error (10k or even 50k nodes seems like plenty).
- Provide an API allowing safe usage by rejecting all anchors. default behavior is TBD.
I like option 3; it feels safer and easier to implement than 1 and 2.
R.e. option 1:
- As far as I can tell, pyyaml doesn't have a setting to disable anchors (see e.g. https://github.com/yaml/pyyaml/issues/37), so we'd have to roll our own anchor detection logic.
- I can recreate the attack with pure python (i.e. this is not just a yaml problem):
# lol_attack.py
lol1 = "lol"
lol2 = [lol1, lol1, lol1, lol1, lol1, lol1, lol1, lol1, lol1]
lol3 = [lol2, lol2, lol2, lol2, lol2, lol2, lol2, lol2, lol2]
lol4 = [lol3, lol3, lol3, lol3, lol3, lol3, lol3, lol3, lol3]
lol5 = [lol4, lol4, lol4, lol4, lol4, lol4, lol4, lol4, lol4]
lol6 = [lol5, lol5, lol5, lol5, lol5, lol5, lol5, lol5, lol5]
lol7 = [lol6, lol6, lol6, lol6, lol6, lol6, lol6, lol6, lol6]
lol8 = [lol7, lol7, lol7, lol7, lol7, lol7, lol7, lol7, lol7]
lol9 = [lol8, lol8, lol8, lol8, lol8, lol8, lol8, lol8, lol8]
lol10 = [lol9, lol9, lol9, lol9, lol9, lol9, lol9, lol9, lol9]
# The below lines will fill up computer memory:
# from omegaconf import OmegaConf
# cfg = OmegaConf.create(lol10)
R.e. option 2: Still vulnerable if the user calls OmegaConf.resolve(cfg). Also, as @omry said above, possibly tricky to implement.
The problem with having a hard limit on number of nodes is that it could break for legitimate use cases suddenly based on user input.
Imagine a system in production that suddenly blows up because someone fed a slightly larger config.
Config composition can also create very large config via small ones, making this even more likely.
I am not worried about users calling resolve(). It usage is much more rare than loading yaml files.
4 is also interesting. Maybe have a static function for setting anchors policy and have it reject them by default. (detected by seeing the same object in the dictionary). Low level can raise a DuplicateNodeError and the function loading yaml can link to to github issue (or a comment on it) explaining the issue and what the options are.
something like: Duplicate node detected, see github.com/.../ for more information
Where the page would indicate that anchors are inferior because they don't work cross files and are vulnerable to this attack and suggest converting to interpolations, or, as an alternative - disable the check vis something like OmegaConf.enable_unsafe_anchors(true).
4 is also interesting. Maybe have a static function for setting anchors policy and have it reject them by default.
So the idea is to introduce some global state that keeps track of the current policy, right? Or perhaps the user could set a flag throw_on_duplicate_nodes=True in their call to OmegaConf.{create,load,structured}?
Yes. If we are introducing the ability to specify policy as a file header, we also need the ability to completely disable the functionality for people who cannot trust their input configs.
class AnchorsPolicy(Enum):
# Reject anchors, allow for header to change behavior. Default behavior
HEADER = 1
# Always allow anchors
UNSAFE = 2
# Never allow anchors. Reject anchors even in file header attempts to enable it.
RESTRICTED = 3
OmegaConf.setAnchorsPolicy(policy: AnchorsPolicy)