dev icon indicating copy to clipboard operation
dev copied to clipboard

Function Syntax RFC

Open x87 opened this issue 1 year ago • 84 comments

Goal

  • Concise syntax for SCM functions
  • Make SCM functions easier to use by hiding some low-level details

Considerations for function syntax

Pascal-style:

function a12(name: string): string
  • easier to find using a "function" keyword
  • param declaration is consistent with the "var" block syntax

~~#### C-style:~~

string a12(string name)

~~* consistent with inline var declaration~~

Pascal-style is more consistent with the rest of the language.

Rules

  • ~~Function must be declared before first usage~~ Function are available anywhere within current scope. Functions defined inside other functions are available anywhere in that function body

  • Function body starts with the "function" keyword followed by the function signature.

  • Function name ~~prefixed with @~~ is a valid label

  • The signature includes input parameters (if any) and their types, comma-separated.

  • Input parameters, when present, must be enclosed in "()". For zero-input functions "()" is optional

  • Input parameters are followed by a return type if the function returns anything

  • Multiple return types are comma-separated, ~~and parenthesized.~~

  • Return type(s) can be prefixed with the optional keyword. Function with an optional result may return nothing.

  • Function body ends with the end keyword.

  • return keyword immediately exits the function and returns control to the calling code. See Return Semantics

    • functions with optional return type(s) can use blank return to bail out immediately while returning nothing
    • There is special logical return type. This result can only be validated in IF..THEN and can not be stored in a variable. Logical function returns true or false.
    • functions that define return type(s) must return same number of values. logical function must return a condition flag. Optional functions may return nothing.
    • When you return with some value(s), it is always a true condition for IF..THEN. Empty return is always a false condition. Value returned from a logical function defines the condition.
  • ~~Function can return one or more values using the following syntax: return <condition flag> <value1> <value2> ....~~

  • ~~return keyword in functions should not be confused with return keyword in gosubs. Function's return is always followed by some values, or true, or false using CLEO5's CLEO_RETURN_WITH~~

  • end keywords serves as an implicit return ~~with the default values matching the function signature~~ ~~using RETURN (CLEO5 is required)~~ using CLEO_RETURN

  • ~~return false is a special case that can be used in any function. It sets the condition result to false and exits the function ignoring all output variables~~

  • ~~return true is a special case that can be used in function with no return type. It sets the condition result to true.~~

Examples

Declaration

  • Function name serves as a label. it can not duplicate an existing label.
  • Functions can be declared upfront to allow calls be located before the function body (forward declarations, see below)
  • "end" represents ~~CLEO_RETURN_FAIL~~ ~~RETURN~~ CLEO_RETURN 0 ~~if there no an explicit cleo_return_* on the preceding line~~
{$CLEO}
function a0
end // implicit CLEO_RETURN 0

function a1(x: int)
end // implicit CLEO_RETURN 0

function a2(x: int, y: int)
end // implicit CLEO_RETURN 0

function a3(): int
end // implicit CLEO_RETURN 0

function a4(): int, int
end // implicit CLEO_RETURN 0

function a5(): int
    return 42 // explicit cleo_return_with true 42
end // implicit CLEO_RETURN 0

function a6(): int, int
    return 42 84 // explicit cleo_return_with true 42 84
end // implicit CLEO_RETURN 0

function a7(): int
    if 0@>0
    then
        return 42 // explicit cleo_return_with true 42
    end
end // implicit CLEO_RETURN 0

function a8()
    if 0@>0
    then
        return // explicit 0051: return
    end
end // implicit CLEO_RETURN 0

function a9(): logical
    return false // explicit cleo_return_with false
end // implicit CLEO_RETURN 0

function a10(): float
    if 0@>0
    then
        return 42.0 // explicit cleo_return_with true 42.0
    end
end // implicit CLEO_RETURN 0

function a11(): string
    if 0@>0
    then
        return 'test' // explicit cleo_return_with true 'test'
    end
end // implicit CLEO_RETURN 0

function a12(name: string): string
    return name // explicit cleo_return_with true 0@ 1@
end // implicit CLEO_RETURN 0

function a14(): int
  if 0@>0
  then return 1 // cleo_return_with true 1
  else return 0 // cleo_return_with true 0
  end
end  // implicit CLEO_RETURN 0


function a15(): int, float
  if 0@>0
  then return 1 2.0 // cleo_return_with true 1 2.0
  else return 0 0 // cleo_return_with true 0 0
  end
end // implicit CLEO_RETURN 0

Examples of logical/optional return types

function no_ret_ok
    return
end

function no_ret_error
//    return 1 // error, should not return a value
end

function ret_logical_ok: logical
    return 1
    return 0
    return 0@ == 0
end

function ret_logical_error: logical
//    return // error, should return 1 value
//    return 1 2 // error, should return 1 value
end

function ret_1_ok: int
    return 0
end

function ret_1_error: int
//    return // error, should return 1 value
//    return 1 2 // error, should return 1 value
end

function ret_2_ok: int, int
    return 0 0
end

function ret_2_error: int, int
//    return      // error, should return 2 integer values
//    return 1    // error, should return 2 integer values
end

function opt_1_ok: optional int
    return
    return 0
end

function opt_1_error: optional int
//    return 1 2 // error, should return 1 integer value
end

function opt_2_ok: optional int, int
    return
    return 1 2
end

function opt_2_error: optional int, int
//    return 1 // error, should return 2 integer values
//    return 1 2 3 // error, should return 2 integer values
end



if and
    ret_logical_ok()
    0@ = ret_1_ok()
    0@, 1@ = ret_2_ok()
    0@ = opt_1_ok()
    0@, 1@ = opt_2_ok()
then
    // ok
else
    // error
end

Calling functions

a0() // ambiguous: gosub or call?, need lookahead
a1(5) // cleo_call a1 1 5
a2(5,6) // cleo_call a2 2 5 6 

// single result
0@ = a3() // cleo_call a3 0 0@

// multiple results
0@, 1@ = a4() // cleo_call a4 0 0@ 1@

// use functions in initialization position
int x = a3() // cleo_call a3 0 0@
string name = a12("test") // cleo_call a12 1 "test" 0@ 1@

// logical functions
if a9()
then
...
end

Grammar

function := ["export" whitespace] "function" whitespace identifier "(" params ")" ( return_type1 | return_type2)
params := [ identifier ":" type ] [ "," params ]

return_type1 := [ "(" ] type [ ")" ]
return_type2 := "(" type "," types ")"
types := type [ "," types ]

var1 := [ "(" ] var [ ")" ]
var2 := "(" var "," vars ")"
vars := var [ "," vars ]

function_call := [ ( var1 | var2 ) "=" ] identifier "(" args ")"
args := ( var | const ) [ "," args ]

x87 avatar Sep 14 '23 14:09 x87

TODO: make it compatible with #45

x87 avatar Sep 14 '23 16:09 x87

What about calling function reference stored in variable (like passing callback function via parameter, or functions stored in array)? Currently in 0AB1 function label can be variable.

MiranDMC avatar Sep 14 '23 19:09 MiranDMC

What about functions that can take/return multiple types of variables? Like numToText accepting first param int or float depending on second boolean param. Same situation with return type, where depending on logic return type might be int or float.

MiranDMC avatar Sep 14 '23 19:09 MiranDMC

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

MiranDMC avatar Sep 14 '23 19:09 MiranDMC

What about functions that can take/return multiple types of variables? Like numToText accepting first param int or float depending on second boolean param. Same situation with return type, where depending on logic return type might be int or float.

union types are out of scope for this RFC. There should be a separate function for each combination of types

x87 avatar Sep 14 '23 20:09 x87

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

any idea for complete syntax?

x87 avatar Sep 14 '23 20:09 x87

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

any idea for complete syntax?

Though case. For sure condition result can not be on the right side of the return keyword. It leads to idea having it on the left: return 5@ - condition result true not return 5@ - condition result false 5@ return 5@ - condition result based on value of 5@

MiranDMC avatar Sep 14 '23 20:09 MiranDMC

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

any idea for complete syntax?

Though case. For sure condition result can not be on the right side of the return keyword. It leads to idea having it on the left: return 5@ - condition result true not return 5@ - condition result false 5@ return 5@ - condition result based on value of 5@

this syntax is complicated and confusing. I think the last case should be handled outside of the function, or the boolean value should be returned as a regular value.

A function should have the condition result set to true at the start of the function. It would allow to use empty functions or functions without ifs in conditions.

if fun1()
then
//  <---  fun1() is true
end

function fun1() <--- condition result set to true by cleo_call

end <--- return as is

If the function wants to explicitly change the condition result to false, we can use return false. Note that return 0 would be considered as a 'true' result, so return false and return 0 are not the same.

Then you are allowed to use regular tricks with conditional opcodes to alter the condition result:

function fun1()

  is_australian_game // changes condition_result to false
  is_pc_game // changes condition result to true

end

so to summarize:

If you return any value, it is always a success. Condition result is true (unless altered by the last conditional opcode) If you return false it is always a failure. No output variables are modified.

We can use 8AB2 for return false case. It collects all returned values, then skips the output variables in the caller, then sets the condition result to false.

0@=100 
1@=200 
2@=300
if  (0@,1@,2@) = fun() // cleo_call 0 0@ 1@ 2@
then
  print 0@ 1@ 2@ // here it prints 10 20 30
else
  print 0@ 1@ 2@ // here it prints 100 200 300
end

function fun(): (int, int, int)
  if not <cond>
  then
    return false // 8AB2: not cleo_return 3 0 0 0
  end
  
  return (10,20,30) // 0AB2: cleo_return 3 10 20 30
end

x87 avatar Sep 15 '23 13:09 x87

Um I really don't like fact that return false would behave differently than other cases, and what is situation where function is supposed to return one bool param? Syntax is getting ambiguous. Feature of leaving the function sounds nice, but should not be performed with return keyword. This seems be ideal case for 'break'.

Inexplicit carrying on condition state of last executed opcode also seems not right. Instead of easy functionality we get hidden convoluted logic without clear rules enforcement. Function returns true by default, but seemingly not related change will make behave it differently. Generally in my practice I always use variable to carry function ok\failed state, as it can not be trusted to set in in middle of the function and hope it is still valid while exiting. In my opinion end user should not be even aware that condition result exists in the background all the time.

Maybe instead of return we should only allow return_true and return_false as function ending keywords?

MiranDMC avatar Sep 15 '23 19:09 MiranDMC

If function returns one bool param, it suits the proposed logic.

Bool result

if 
  test() // cleo_call @test 0
then
  // true
else
  // false
end

function test(): bool
  if x 
  then
    return true // cleo_return 0
  else
    return false // not cleo_return 0
  end
end

note that bool return type does not require a variable to store the result

IF and SET

if 
  0@ = test() // cleo_call @test 0 0@
then
  // 0@ is 1
else
  // 0@ is not set
end

function test(): int
  if x
  then
    return 1 // cleo_return 1 1
  else
    return false // not cleo_return 1 0  /// last zero is just a placeholder to match the function signature. CLEO does NOT set result to 0
  end
end

x87 avatar Sep 15 '23 19:09 x87

Currently there is no such thing as bool type. Will it be reserved for condition result only? It might be good idea do declare if function sets condition state or else it will be true by default. Then condition result is just one of the args in return call statement. Hm, doesn't it just boils down to treating first return param as condition result? Where value different than 0 is considered as true? That seems reasonable. Cuts extra complications.

`func test() : (int, string, float) (...) return(true, "result", 3.0) end

test() // allowed, discard return

if test() // allowed, check first return <> 0

if (0@, 1@, 2@) = test() // allowed, first as condition result

if (0@, 1@) = test() // error, not all return values used

if (_, 0@, _) = test() // allowed, first returned value as condition (as usual). Store just second argument`

Idea to consider is to allow '_' in return calls like return(false, _, _, _, -1)

I see one problem with // 0@ is not set example: 0@ = true (...) 0@ = isThingEnabled()

MiranDMC avatar Sep 15 '23 21:09 MiranDMC

Not sure we need to overengineer it. Many functions are just calculations and they don't need to work with condition at all.

As you mentioned, there is no bool type which is correct. It means you can't store a result of a bool function into a variable. It only works as a condition.

0@ = true
if isThingEnabled()
then
 0@ = true // or 1
else
 0@ = false // or 0
end

x87 avatar Sep 15 '23 22:09 x87

Many functions are just calculations and they don't need to work with condition at all.

Then it simply just returns single argument (for example float). If used in condition statement then in typical fashion return[0] is tested for <> 0.

Introducing Boolean type will make everyone question why it is not possible to use it for var declarations or as input argument type.

Some corner cases: func test() : (int, int) // 2 return values func test() : (bool, int) // 1 return value now? Bit confusing func test() : (int, bool) // hm, what now? func test() : (bool, int, bool) // ???

Over engineering is when you have multiple rules to describe simple thing.

I propose single rule: condition result of function call is return param[0] <> 0, true if no returns

MiranDMC avatar Sep 15 '23 23:09 MiranDMC

My idea was to use bool in functions that return no values. You can't mix bool with other types.

x87 avatar Sep 15 '23 23:09 x87

I propose single rule: condition result of function call is return param[0] <> 0, true if no returns

how do you express this with opcodes?

x87 avatar Sep 15 '23 23:09 x87

What do you mean with opcodes? I posted examples above.

func test() : (int, string, float)
   (...)
   return(true, "result", 3.0)
end

test() // allowed, discard return all return values

if test() // check first return <> 0, discard all return values
   (...)
end

if (0@, 1@, 2@) = test() // check first return <> 0
   (...)
end

MiranDMC avatar Sep 15 '23 23:09 MiranDMC

Mentioned _ is just next feature proposition inspired on what recent C++ language received. It is possible to store multiple returns in similar fashion you proposed, where _ is often used as 'ignored' param.

Maybe it should be keyword 'null' instead. Now it makes more sense to call function with null as some params. Currently 0AB1 accepts providing less parameters than expected, they get default value 0. That why I was complaining about default legacy mode in main.scm

MiranDMC avatar Sep 15 '23 23:09 MiranDMC

What do you mean with opcodes? I posted examples above.

func test() : (int, string, float)
   (...)
   return(true, "result", 3.0)
end

test() // allowed, discard return all return values

if test() // check first return <> 0, discard all return values
   (...)
end

if (0@, 1@, 2@) = test() // check first return <> 0
   (...)
end

Rewrite this example using opcodes only please. As if you just decompiled the script.

x87 avatar Sep 15 '23 23:09 x87

:test
   (...)
   0AB2: cleo_return args 3 true "result" 3.0 // set CLEO condition result based on arg[0]

0AB1: @test args 0

if
   0AB1: @test args 0
then
   (...)
end

if  0AB1: @test args 0 result 0@ 1@ 2@
   (...)
end

MiranDMC avatar Sep 16 '23 00:09 MiranDMC

  1. Your script will crash on the line 0AB1: @test args 0 0AB1 should have enough variables to match 0AB2.
  2. You can't change the behavior of 0AB2 because it breaks the existing scripts.

Imagine there is a script:

{$CLEO .cs}
0000:
wait 1000
if
    0AB1: @test args 0 0@ 1@ 2@
then
    0ACE: show_formatted_text_box "Yes"
else
    0ACE: show_formatted_text_box "No"
end

0A93: terminate_this_custom_script


:test
059A:  return_false
0AB2: cleo_return args 3 1 2 3

Today, it shows the message "No", because the condition result was modified by a conditional opcode 059A (which is expected and fits the language). 0AB2 does not modify the result. With your proposal the behavior changes and it will now display "Yes".

Can you address these two concerns in your script (both high-level and low-level)?

x87 avatar Sep 16 '23 00:09 x87

  1. I propose to add support of discarding all return values of 0AB2. So there should be accepted scenarios where all parameters are used, or none. If any other prams count is specified then it should result in error message. I think currently in case of mismatch 0AB2 just consumes following opcodes in the script instead.

  2. Yes it would need change/update in 0AB2 condition result behaviour. It was fixed only recently, so I don't know if anybody ever used it. Anyway, nobody says function return have to be based on 0AB2. Recently there was also that idea to redesign return keyword into universal fit all cases function. New return opcode could do that, manage condition result and return values, plus work with GOSUB commands too.

MiranDMC avatar Sep 16 '23 00:09 MiranDMC

I propose to add support of discarding all return values of 0AB2. So there should be accepted scenarios where all parameters are used, or none.

0AB2: cleo_return 0 does it already. You can return all or nothing.

Yes it would need change/update in 0AB2 condition result behaviour. It was fixed only recently, so I don't know if anybody ever used it.

It was fixed for a scenario with multiple conditions. A single condition (see my example) has been used for years.

x87 avatar Sep 16 '23 00:09 x87

:test
   (...)
0AB2: cleo_return args 4 1 2 3 4

cleo_call @test args 0 // this is not possible now

Yep, legacy behaviour of 0AB2 is untouchable then.

MiranDMC avatar Sep 16 '23 00:09 MiranDMC

UPDATE 11/13/2023

Make two separate opcodes to allow for true/false return without arguments.

CLEO_RETURN_FALSE - 0 params, exits current function, ignores all caller's variables, sets the cond result to false CLEO_RETURN_WITH - 0 or more params, exits current function, must match all caller's variables, sets the cond result to true

retf // CLEO_RETURN_FALSE retw true // CLEO_RETURN_WITH 1 retw 1 // CLEO_RETURN_WITH 1 retw 0 // CLEO_RETURN_WITH 0 /// this is a true condition! retw 1 2 3 // CLEO_RETURN_WITH 1 2 3


My proposal is to add a new cleo_return. We can certainly use 8AB2 but it goes against the language design, so a new command could be better.

CLEO_RETURN_WITH - changes the condition result and returns values

Examples

CLEO_RETURN_WITH                    // when used with no arguments sets the condition result to false
CLEO_RETURN_WITH 1                  // with arguments the condition result is true, output variable is set to 1
CLEO_RETURN_WITH 1 2 3              // condition result is true, output variables are set to 1 2 3

These examples are based on the assumption that we could omit the nResults parameter and figure it out dynamically.

Before writing the result, CLEO_RETURN_WITH checks if the calling code has the variables.

scmFunc->Return(thread);
-if (nRetParams) SetScriptParams(thread, nRetParams);
+if (nRetParams && (*thread->GetBytePointer())) SetScriptParams(thread, nRetParams);

it solves the case when CLEO_RETURN_WITH 1 is used as a pure boolean call:

if x()
then
(...)
end

:x
CLEO_RETURN_WITH TRUE
end

Then we can use the following syntax with this proposal:


return false // CLEO_RETURN_WITH
return true // CLEO_RETURN_WITH 1
return 1 // CLEO_RETURN_WITH 1
return 0 // CLEO_RETURN_WITH 0  /// this is a true condition!
return (1, 2, 3) // CLEO_RETURN_WITH 1 2 3

x87 avatar Sep 16 '23 01:09 x87

My idea is, that a function either returns something and the condition result is true, or returns nothing and the condition result is false. There is no case, when you need to return something and set the condition to false.

x87 avatar Sep 16 '23 01:09 x87

Sounds reasonable. I have some functions that return both condition result false and values, like obtaining entity where for failed case returned handle is -1. I guess with new return it would be possible to assign error fallback value before function call, so it just won't be updated if function fails.

MiranDMC avatar Sep 16 '23 01:09 MiranDMC

Pure Functions

  • It should be impossible to use global variables in function body if this function is located in a headless script (a CLEO script, a module). Function should rely only on input arguments
  • It should be impossible to use labels outside of function body (e.g. for jump or gosub). Functions may call other functions in the same script.
    • functions can define scoped labels visible within the function body and reference them; these labels are not visible to the outside code
function f()
  gosub @sub // OK
  return true

  :sub
  return
end

gosub @sub // ERROR

Are these rules too strict?

x87 avatar Sep 18 '23 14:09 x87

Forbidding usage of global variables seems too strict. If you really want to force people stop messing global variables via .cs, maybe Sanny should require some macro in script to enable "globals write mode".

External labels are useful in case of accessing hex blocks. Redirecting program flow to outside labels should perhaps be forbidden, as local variables declared by the function will be allocated but their declarations will not be accessible outside.

Calling other functions (0AB1) declared outside function body should be possible.

Hiding local labels outside sounds to be great feature. Keeping autocomplete list clear and preventing bugs when copy-pasting the code.

MiranDMC avatar Sep 18 '23 22:09 MiranDMC

I agree that a function should not be dependent on global variables which might vary in custom mains. A good compromise would be to restrict global variables but allow aDMA, so the varspace is available for stuff like VarspaceSize = &3 and global opcodes can be used for evaluating and manipulating memory.

OrionSR avatar Sep 20 '23 04:09 OrionSR

Yep I keep forgetting about dynamic allocation of new globals just by using new variable name. This is problematic case. In all other scenarios (well known global variables like $PLAYER_CHAR or defined with Alloc) this will make functions inferior to regular cleo_call.

MiranDMC avatar Sep 21 '23 19:09 MiranDMC