Protect against stack overflow in parser
At the moment, most (if not all) Fluent parsers are implemented with recursive descent. Therefore, they are vulnerable to stack overflow. If a Fluent file comes from an untrusted source, a malicious user can bring the parser down with such resource as a = {{{{{…500 times…{{{{{ (0.5 KB!). The impact varies between different environments: in some, a runtime raises an exception that can be recovered from; in others, the whole process is terminated.
I think, parsing such extremely deeply nested constructs as Junk (even if they are otherwise well-formed) fits Fluent’s recovery strategy well.
Non-recursive parsers are unaffected by this issue, but it seems a good idea if they would impose similiar limits. Although they are able to construct an AST in such pathological cases, it cannot be processed recursively by other parts of the program.
I spotted these cycles in the grammar that need to be taken care of:
inline_placeable → InlineExpression → inline_placeable,inline_placeable → SelectExpression → InlineExpression → inline_placeable,InlineExpression → (FunctionReference | TermReference) → CallArguments → argument_list → Argument → InlineExpression,Pattern → PatternElement → inline_placeable → SelectExpression → variant_list → (Variant | DefaultVariant) → Pattern.
There is also block_placeable, but it’s defined in terms of inline_placeable, so it likely won’t need special handling. Worth mentioning, anyway.
I see the point, but I'm not sure that the answer lies in the syntax specification.
There might be something like implementation notes, where we suggest that parser implementers protect themselves. The practical constraints seem to be very implementation specific, so I'm not sure we should have a hard-n-fast rule.
I see three ways here.
-
Explicitly specify that there are no limits. Implementations must accept any resource that can exist. Downside is that it will significantly complicate both implementations and user code working with them (cannot process AST recursively, all algorithms must always emulate the call stack with, e.g., a dynamic array). Also, not all limits can easily be avoided (think of manipulating files exceeding 4 GB on a 32-bit machine). Don’t know if there is any practical usefulness.
-
Specify minimal limits. Implementations must accept any resource that fits the limits and may accept those exceeding them. This clearly defines a subset of valid documents that are guaranteed to be understood by every implementation. One practical implication I can think of is machine-generated Fluent: when limits are well-defined, generators can produce code in such a way that it doesn’t hit them. Of course, there is a difficulty choosing exact numbers.
-
Allow implementations to put restrictions as they deem needed. This is the way C++ has chosen (Annex B):
- Because computers are finite, C++ implementations are inevitably limited in the size of the programs they can successfully process. Every implementation shall document those limitations where known. This documentation may cite fixed limits where they exist, say how to compute variable limits as a function of available resources, or say that fixed limits do not exist or are unknown.
- The limits may constrain quantities that include those described below or others. The bracketed number following each quantity is recommended as the minimum for that quantity. However, these quantities are only guidelines and do not determine compliance.
However, validity of a document becomes implementation-dependent (and even configuration-dependent).
I guess, for such a simple language as Fluent, option 3 might be fine. So I agree, having an “Implementation Notes” document with recommendations and corner cases will likely be enough.