ton
ton copied to clipboard
FunC: pure -> impure calls should be forbidden or screaming
Calling impure functions from pure should at least emit a warning or be forbidden overall. Because pure function can be optimized out along with their impure function calls it can cause not-so-defined behaviour due to analyzer optimizing out pure function calls.
Logically, pure function should never call impure functions because they are not supposed to affect data in any way. Getters are pure and must not affect data, and message handlers are impure, and can affect data. That way, forbidding impure calls from pure functions should not introduce any problems but rather reinforce security on FunC language lowering the burden for developer to track all those call chains.
Even more, without this feature it is easy to mess up and forget the impure specifier on some function, that modifies data. With this check in place the compiler will kindly (or not so kindly) remind developer that the function should be impure because it is calling impure functions (affecting data).
Maybe the only possible exception to this rule should be calling throw* functions from getters or from functions called from getters (calling them from pure functions called ultimately by message handlers should still be disallowed).
Together with detection of unused variables (optimized out variables or functions) it may noticeably reinforce security of FunC language overall.
You are completely right from the theoretical perspective, and we considered enforcing this restriction at some point. However, we decided not to; maybe issuing a warning is a better idea, but it is not clear whether such warnings should be enabled by default.
To understand why, consider the following questions:
- Is
now()pure or impure? Do we want calls tonow()to be removed from code if their result is unused? - Can a pure function call
now()? What about a get-method? - Same questions about
get_seed(),set_seed()andrand(). How are they different fromnow()? - Same questions about
get_data()andset_data(), and functions containing calls to them. Should a function calling onlyget_data()be considered pure? Can we optimize out calls to such functions if their result is unused? - Are
throwandthrow_ifpure or impure? - Can a pure arithmetic function throw exceptions, such as the division by zero exception, for some values of its arguments?
- If a pure arithmetic function is allowed to throw exceptions for some values of its parameters, and we optimize out a call to this function because its result is unused, then the outer function will not throw an exception for some values of its parameters even if it is expected. Is this a correct behavior?
- Consider a function that checks a (pure) condition depending on its arguments and throws an exception if the condition fails. Should be this function pure or impure?
- Consider values of function types, such as
int -> int. Do they represent pure functions or impure functions? Should we introduce pure/impure modifiers for the function types? How would they work with automatic type deduction?
@ton-blockchain thank you for your interesting insight and ideas. Thinked about them for a while.
As I understood, actually, the problem lies in definition of the pure function. From the algorithmical and theoretical (arithmetical) point of view pure function's output should depend only on it's inputs, it is kind of isolated, not interacting with environment in any way. From this definition lies most of the outlined problems. From the practical point of view, and possibly widely considered among developers accounting for absense of documentation and more programmistic than algorithmic or mathematic approach, function in FunC shall be considered impure if it affects the environment, this way acquiring data from environment is allowed. Because if we consider the first theoretical definition getters are impossible to exist because they would have to query data from contract anyway - which is considered impure by that definition.
Now lets reconsider those questions with practical definition of impurity in mind:
now()is pure function because it does not affect environment, just queries data from C7- Irrelevant because of 1
get_seed()can be considered pure, but withset_seed()andrand()we have a problem. call of such function affects global nonpersistent variables (calling these would modify current random seed), therefore they are not pure 3+. This goes also for functions that modify global variables for the same reason 3?. This may actually cause new classification of functions - to be 100% safe and secure there may be pure/impure functions (that affect or do not affect persistent storage) and isolated functions that are true arithemtically pure functions with severe restrictions (data has to be passed in parameters including now, randoms, seeds, if needed)get_data()is therefore pure, andset_data()is the purest (pun intended) example of impure function that directly affects the persistent data, the pinnacle of impurity. it would be strange for developer to not use the result ofget_data()function, this definitely should be a warning.- This is the most difficult question. from practical point of view these functions are impure because they directly alter execution flow, therefore affecting environment. However, the function may crash by itself (for example when unpacking slice), therefore they may be special case of functions that may be called from pure functions but must not be optimized. However, this is a huge point for dicussion.
- To be perfectly pure function must not be able to throw errors, but rather returning some value, for example NaN if integer is expected (as for your example div/0), or null for objects, or whatever. It must be responsibility of the calling function to verify the result. Maybe it may be useful and interesting to implement try { } catch { } :).
- Outlined in 6. This is not completely correct behaviour.
- Considering 6, this function should ideally be considered impure.
- I have not yet scrached surface of function arguments / variables, but as I understand, each expr / symbol has impure flag that can be used. It may be useful to introduce such modifiers. About the automatic type deduction I have another interesting idea, which i will outline a bit later (may be useful for discrimination of such pure/impure callings with "pure" getters).
Now, reconsidering the actual purity concerning the situation if call is optimized out:
- Pure, it would not affect anythin if omitted
- Yes, because it is pure
get_seedcan be considered pure,set_seedandrandcannot, because they affect the global seed valueget_datacan be considered pure,set_datais absolutely impure- Impure because if call is optimized, then exception may not be thrown that changes execution flow
- No, because if optimized out flow logic changes
- Not a correct behaviour, function should rather return a value representing the error
- Impure, because optimizing out such call may change program flow
- Outlined below.
About resoluting the calls and determining information from calls: if it is not possible to explicitly define purity of the function, it may be considered pure if it does not call any impure functions.
Nevertheless I am completely certain that warning about optimized out function and operator calls (except for compiler-time applied) is really important, because it would not only warn developer about possible unexpected behaviour but also catch nasty bugs like copy-paste errors (wrong function) but also variable redefinition (inner scoped variable will be lost on scope leave and operations with it would be optimized, but it should be warning on its own) and missing impure flag when needed.
Maybe such proposition about filtering calls directly is relatively simple but too perfectionistic. The most useful way is actually to react to optimized out calls to operators and functions (that way detecting potential mistakes in assignments, redefinitions and impurity conditions of functions).
At the moment i really struggle with filtering out optimizations that accumulate result of const variable calculations and disable operators (something like 1 + 2 + 3 + 4 becomes 1 + 3 + 3 + 4 then 1 + 3 + 6 + 4 and then finally 1 + 3 + 6 + 10 with bold denoting active expression.
For example i had a small copy paste issue in my contest contract #2 that could be easily caught if compiler screamed out that the box~[flags=](bflags & Flag:!FRPending()); command assigned value to a box variable that was not used anymore in control flow until return ();.
Yes, warning about optimized function is very important. Original bug report "Silent compilation of vulnerable code" with detail description read here - https://github.com/plexar88/ico/blob/master/bugs.txt
Well, some time ago I found a way to separate builtin optimizations from unused variables and suggested an improvement in #229 that somewhat fixes this (while being a little too strict, but not restricting).
@plexar88 when using that new functionality, compiling would yield following warnings: At first level (-u, assignments only):
test.fc:24:8 Warning: unused variable assignment
data = begin_cell().store_uint(0xc0ffee,32).end_cell();
Here compiler warns you that result of that assignment is not used, meaning that something is wrong.
At second level (-uu, assignments and calls):
test.fc:24:22 Warning: unused call to begin_cell
data = begin_cell().store_uint(0xc0ffee,32).end_cell();
^
test.fc:24:22 Warning: unused call to store_uint
data = begin_cell().store_uint(0xc0ffee,32).end_cell();
^
test.fc:24:46 Warning: unused call to end_cell
data = begin_cell().store_uint(0xc0ffee,32).end_cell();
^
test.fc:24:8 Warning: unused variable assignment
data = begin_cell().store_uint(0xc0ffee,32).end_cell();
^
test.fc:25:12 Warning: unused call to fn3
fn3(data);
^
Here each unused call is warned about including the dreaded call to fn3.
I got a little messed up by this optimization too (set gas limit was optimized out because I forgot to declare asm impure, and had some copy-paste issues), after what i implemented such verification.
I also thought about that it may be reasonable to implement pure functions (while we already have impure functions also add pure modifier, considering other functions neutral) that cannot access any enviromental data at all, however that would require marking builtins as pure / impure and do not forget about the exceptions problem. Anyway does not seem to be top priority task.