ton icon indicating copy to clipboard operation
ton copied to clipboard

FunC: implicit functions (parameterless)

Open Skydev0h opened this issue 5 years ago • 4 comments

This PR implements an implicit keyword that allows to define functions that should not be followed by arguments tuple. This way it is possible to more beautifully implement something more closely resembling structures without awful () everywhere.

This much like native constants (autoapply functions) with the exception that it is possible to chain function call with variable (with calling . or modifying ~) and it would not expect any arguments tuple afterwards.

A tiny example:

int implasm() implicit asm "32 PUSHINT";

int explflag(int a) asm "1 PUSHINT AND";
int implflag(int a) implicit asm "2 PUSHINT OR";

int implfun(int this) implicit {
	return 32 * this;
}

(int, ()) ~implmod(int this) implicit {
	return (this + 33, ());
}

...

() main() {
	int test = 1;
	test~implmod;
	var data = begin_cell().store_uint(0xc0de, test.implfun).end_cell();
	if (fn1(data) | fn2(data)){
		data = begin_cell().store_uint(0xc0ffee, implasm)
			.store_uint(test.explflag(), test.implflag).end_cell();	
		fn3(data);
	}
	return ();
}

Skydev0h avatar Jan 30 '20 21:01 Skydev0h

A nice and simple commit. These implicit functions are somewhat similar to properties in Delphi and some other object-oriented languages. An interesting question is whether it would be also useful to convert x.property = RHS into x~property(RHS) and further into x = ~property(x, RHS).

ton-blockchain avatar Feb 04 '20 15:02 ton-blockchain

Interesting point. As of using those as properties some kind of scope restriction should be in place (for example, when making code for second contest my flags property of root scope clashed with flags property from inner Box object scope (not to be confused with native box), therefore I had to move flags of Box to beginning too so that they would have same index in tuple.

This shenanigan can be seen here, the def is commented because it was synchronized with root scope.

As of property-like syntax, I nearly attempted same thing when refactoring my contest entry to FunC-Extended syntax (the one containing all my proposed changes), therefore something like x.property= rhs is possible for simple rhs (somehow a singular simple ident is appended to parameters tuple even without brackets), but still looks like x.property= (rhs) for more complex ones like this. Note that = is part of function name, resolving var.func = rhs may be hard for expression parser (that is not AST).

I also square-bracketed those functions (therefore gets look like x.[property] and sets x.[property] = rhs or x.[property] = (complex rhs) because otherwise they would clash with local variable names (obviously). This again emphasises that scopes should be limited. I outline it in next idea.


Actually, honestly speaking this feature is beginning of some more interesting and complex idea, that is so highly demanded by developers. IDEA DESCRIPTION AND REASONING FOLLOWS.

I was (and am) thinking about implementing structures in FunC with help of tuples (did not come out about more optimal idea). The main features of such structs would be technically scope-limited functions (idents that are only resolved if invoked on tuple with attached structure type data, also those idents would have highest priority in that case) that can be used both as RHS (for getting) and LHS (for assigning).

Also I was thinking about some serialization specification (such as defining length and signedness of numbers in struct, setting slices as const-length or variable-length (with defining number of bits for length) and as inline or referenced, allowing including inline or as references another structs with defined serializator. That would allow to simply grab a structure from slice and push it back into a builder without complex code. There are some problems and questions with such implementation such as should serializer / deserializer be emitted as a separate function that should be called / inline_called / inline_ref_called, complexity of generated code (it is not a simple 1 2 3), complexity of serializing structs inside structs. Therefore I am more on the edge that serializing and deserializing is better done manually because it would provide more control of what is going on.

Nevertheless, the structures and being able to access their properties both as LHS and RHS in code may prove to be really useful. I used that tuple decomposition (decomposing storage from access) in my second contest code and it proved to be really useful: unless a field has to be inserted inbetween other fields changes to structure elements (such as type or size) required minimum change of other code. And passing around a single tuple rather than a huge amount of variables and returns proven to be useful as well. There may be some small penalty of using GETINDEX / SETINDEX compared to direct variable access, however at most times it is negligible and may be even more optimal than juggling tens of variables through the stack.

Actually I thinked about using tuples for variable storage when I seen compiler emitting a swap train to move some variables from tail to head, which was really disgusting. Maybe a single GETINDEX is more efficient than tens of swaps.

You can see some usage of this idea (with help of some fc-ext for some sugar, however initial version was created with vanilla FunC syntax) here and there. It also contains some highly concise and optimal pattern for working with bitmasks (called Flags in the code) with help of consts and implicit functions (in vanilla it uses simple asm functions which are more ugly and less optimal on some occasions, but keep the pattern).

Skydev0h avatar Feb 04 '20 18:02 Skydev0h

Well, the idea to implement better tuples and records (tuples with named fields) was indeed one of the things we considered. Some comments:

  1. The first stage is to implement better tuple types, such as [int,int,slice] (the type of all triples t, such that the first two components t_1 and t_2 are int, and t_3 is slice). Technically, such a type would be a subtype of the already-existing basic type tuple, but it would be better for the FunC type system to ignore this fact (otherwise automatic type inference would become impossible). Such tuples are not too complicated to implement, and could be modeled after the already-supported "tensor" or "pseudo-tuple" types such as (int,int,slice) (that are represented by three stack entries of corresponding types). Construction syntax such as var t = [2,3,s] and assignment-decomposition syntax such as var [x,y,s] = t could be supported for such tuples.
  2. The next step is to implement user-defined aliases to types (similar to typedef), for instance, type user_wallet = [int, int, slice]; would define user_wallet as a transparent alias to [int, int, slice].
  3. A more complicated thing to do is to make some of these aliases opaque outside of a block of definitions, so that functions in a library would still know that user_wallet is the same as [int, int, slice], but outside functions would treat user_wallet as a opaque type, especially during type inference. This would provide FunC programs with some degree of modularity and encapsulation.
  4. Another development would be to introduce named records, i.e. tuples in which some or all fields have some names. For instance, type user_wallet = [int user_id, int balance, slice]. Such records could be equivalent to the usual tuples, with the difference that (inline) projector functions .user_id : user_wallet -> int and .balance : user_wallet -> int, actually equivalent to at(0) and at(1), would be automatically defined. It would be also nice to make these methods "implicit" in the same sense as suggested in this pull request. Then one could write user_wallet w = ... ; return w.user_id + w.balance;, for example.
  5. However, such a development would be not so useful because it would effectively forbid different record types to have fields with the same names. For instance, if another record type user_info=[slice data, cell user_id] has a field user_id of type cell, then .user_id would have to be both a function user_wallet -> int (equivalent to at(0)) and a function user_info -> cell (equivalent to at(1)). In order to implement this properly, overloaded functions would have to be introduced, that have several implementations of different types (for example, the resulting type of .user_id would be user_wallet -> int | user_info -> cell, where | denotes the additive sum of two types), and the type checking algorithms would have to be replaced with more sophisticated versions with some amount of Prolog-style backtracking.

ton-blockchain avatar Feb 05 '20 14:02 ton-blockchain