c2compiler Defer statement

Includes all of defer. Also changes C code to always output { } even for simple if, while etc. This simplifies defer code generation. Tracking defer adds several parts to analysing the function and outputting the right thing. CGen does not need to involve itself in figuring out how to emit the right code. All of that is already done during the analysis.

Nov 15 '18 22:11 lerno

Any updates?

Nov 24 '18 10:11 lerno

only generate do..while if it's needed for break inside defer. Just use a single bit boolean to store whether a defer contains a break.
add more unit tests to test all corner cases: everything that is allowed and not allowed
defer should only run at function end, not at every scope end

Nov 27 '18 08:11 bvdberg

The only alternative to do { } while (0) is simple {}, since we need to protect the defer scope.

Consider:

defer { int a; } int a;

This does not work unless defer has its own scope. Originally I used simple { } to start a scope. That was much less readable than the do-while block. That’s why it’s everywhere. It was a deliberate decision.

Nov 27 '18 10:11 lerno

Why would anyone create variables in a defer? I think we can safely forbid that. Just allow function calls, some assignments and if statements and so on.

Nov 27 '18 10:11 bvdberg

Defer at function end would require closures. Also, it would be completely different to implement. If you want this feature. Call it ”func defer” or something, implement it later. Its use is completely orthogonal to normal defer.

Nov 27 '18 10:11 lerno

@bvdberg of course one would use variables in defer.

Nov 27 '18 10:11 lerno

You’re marking obvious requirements ”useless”. I can’t shake the feeling that you’re thinking of some completely different feature.

Nov 27 '18 10:11 lerno

I will rebase this one, then add more tests, but the feature I NEED is defer that works at the end of scope. At function end is NOT USEFUL and outright BAD. Also, there is no way to "tweak" this current feature to become an "end of function" style defer. Such a functionality is completely and utterly different in implementation from "end of scope"-style defers.

It is possible to write a "func defer" as well. But that one CANNOT use the very simple and straightforward code that scope defers use. Implement it yourself and you'll see that func-defer is a very high level feature, akin to implementing deferred automatic refcounting.

Again: the usecases for such a defer is different from scope-defer.

Nov 27 '18 20:11 lerno

I'd like to add that in most languages, it is indeed expected from defers to run at scope end rather than function end. The primary reason I can think of is that limiting defer to function kinda breaks defers in ifs, switches and nested blocks and definitely breaks defers in loops.

Variables in defers are useful if cleanup requires temporary variables, for example if you had to free a linked-list like structure iteratively.

Nov 27 '18 22:11 luciusmagn

One thing not handled:

switch (foo) {
  case 1:
    defer close(bar);
  case 2:
    printf("b");
}

Presumably defer should trigger on leaving the switch if foo == 1. Right?

Nov 27 '18 23:11 lerno

I'm adding test cases for switch and also make sure that it generates the correct C code.

Nov 27 '18 23:11 lerno

Yea, the scope here is the one particular case, so I would guess that yes. We wouldn't want defer running in different cases. Not to mention that the defer might use a variable declared in the case, which is entirely possible, although cases can't start with a declaration.

Nov 27 '18 23:11 luciusmagn

From what I've read now, switch should be treated as a goto with labels, so we can rewrite it (in terms of analysis) like this:

{
  if (foo == 1) goto label1:
  if (foo == 2) goto label2:
  goto exit:
label1:
  defer close(bar);
label2:
  printf("2");
}
exit:

Since this analysis already exists for goto, it's just a matter of changing the parsing and analysis a little bit.

Nov 27 '18 23:11 lerno

Defer is a nice add-on for C2 and offers developers an extra way of cleaning up stuff. It is not a replacement for C++ destructor behavior. There are 2 options for running: function scope and any scope end. The main criteria is that any option will have to be implemented without a runtime. C2c can generate stuff at compile-time, but no list administration must be needed at run-time.

In my mind, defer is a way to fire-and-forget. So claim a resource and defer the free-ing of it. Mostly the claim can also fail, so should be checked first.

i32 fd = open("file");
if (fd == -1) {
  // handle
  return;
}
defer close(fd);
// .. do stuff with file
// many lines possible here, unrelated of file

So in this case it's easy to forget to close, since there can be many lines between the open and the close.

The other example is any scope, for example in a loop:

while (x) {
    i32 fd = open("file");
    if (fd == -1) {
         continue;  // as an example
     }
     // read file
     close(fd);
}

If the loop is small (and good code should have loop bodies that easilyt fit half a screen), using defer is not really needed I think. So the criteria for using defer would be when there are multiple return paths possible (and you have to call close in each one). Otherwise, just using close is better, because it's more readable.

So that's why I prefer function scope. A second thing to remember is that using defer prevents the use of tail-call optimization.

Nov 28 '18 07:11 bvdberg

Function scope defer cannot be implemented without a runtime.

Nov 28 '18 07:11 lerno

Also, tail call optimization is only prevented for function-defer.

So function-defer:

needs runtime
prevents tail-call optimization

You end by ”that’s why I prefer function scope”. But I don’t ser what the actual argument was. – That it will prevent people from using defer in some cases where you think it should not be used?

The reason we compare it to RAII is that just like with destructors, the execution of the defer + generated code will be will be easily understood.

Note also that scope-defer can be written cleanly in C using C macros in GCC/Clang using ”cleanup” and nested function extensions (using no allocations).

Function-defer can’t.

Nov 28 '18 07:11 lerno

You can have function style scope withou runtime by excluding defers from jumps, while, do, for. At which point there is honestly very little left. (And you still break tail call optimization)

Whereas scope-defer can retain tail-call optimization when needed AND works with jumps, while, do, for.

Try to transform some longer non-trivial code using defer, then convert that to function style defer and scope-style respectively.

In particular, make sure that there are multiple defers and that their execution requires variables that do not exist at function top scope. E.g.

if (foo) { FILE f = open(x); defer close(f); ... many lines of code with early exit ... }

Nov 28 '18 08:11 lerno

I'm looking at the defer_example.c2 I sent earlier. Using any scope, this means you cant do:

if (x) defer close(x);

Since the defer would be called immediately, right? (we could generate a warning for this). But in this case:

i32 fd = open("file");
if (fd == -1) return;
defer close(file);

The defer would effectively be function scope, even if we implemented any-scope.

Okay, let's go ahead with the any-scope then. I think we still need to some sane restrictions to avoid a very complex implementation.

When I implemented the case-body this morning, I found that storing some things in the Scope is very handy (and almost no cost). That way you can just forget about the actual scope stmt (like case/if) and just store and continue. The analyser code then looks like: ( for case, pseudo-code)

analyseCase(CaseStmt* C) {
   enterScope();
   foreach (stmt) analyseStmt(s);
   if (scope.hasDecls()) C->setHasDecls(); // pick info up from Scope.
   leaveScope();
}

This approach only works during analysis, but it could maybe set something in outer bodies/Compound stmts.

Update: One other thing I thought of. There are many scopes, some not so obvious to developers. For example, a Switch stmt has a scope as does every Case stmt. If we do want 'sane' behavior for the example below, we could add a 'DeferScope' that not all scopes have. If that makes any sense...

switch (x) {
case 0:
   defer close(x);  // if makes no sense to also run defer here
   break;
case 1:
    // ..
    break;
}

Nov 28 '18 08:11 bvdberg

if (x) defer close(x) can be written as defer if (x) close(x)

Nov 28 '18 09:11 lerno

In terms of scope, I think it's unwise to have specialized scope rules for defer. It's just one more pitfall for people to fall into. I think defers in switches will be rare, but have their use in this case:

switch (x) {
  case 0:
    FILE* f = open(a);
    defer close(f);
    if (file_starts_with_foo(f)) break;
    if (file_starts_with_bar(f)) return 0;
    b += file_number_foobar(f);
    break;
}

Here there are multiple exit points and we want to close f at each one of them.

Nov 28 '18 10:11 lerno

yes better to keep it all the same

Nov 28 '18 15:11 bvdberg

I was thinking about the implementation. The current one seems to produce correct results, I couldn't find any issues (I did not test in combination with goto, as I don't think that has to work).

See the code below:

{
    i32 a = open('a');
    if (a == 0) return -1;
    defer close(a);

    i32 b = open('b');
    if (b == 0) return -1;  // defer: close a
    defer close(b);

    {
        i32 c = open('c');
        if (c == 0) return -1;  // defer: close b, a
        defer close(c);
        // defer: close c
    }

    {
        i32 d = open('d');
        if (d == 0) return -1;  // defer: close b, a
        defer close(d);
        return 0; // defer: close d, b, a
    }

    // defer: close b, a
    return 0;
}

First: If you look at the order of defers needed to be run, they form a directed, acyclic graph. So D -> B -> A or C -> B -> A. So I thought of letting defers point to each other (D-> B, etc).

Second: I could only think of 4 points in code, where defers need to be run (ie end of scope):

} (just the end of a scope)
return
break
continue So each of these (the } is just a CompoundStmt) just need to keep a pointer to a single DeferStmt (that in turn might point to more).

The graph and the linking to the 'DeferPoints' can all be done at analysis-time, so code generation should be relatively simple.

The advantage of this approach is that you don't need an array of Defers and you don't need the conditionalDefers list.

The condition for this to work is to specify that goto's don't mix with defers. I'm personally in favor of this, since goto should just do a jump and nothing else. The analyser could give errors/warnings for this maybe.

Nov 29 '18 10:11 bvdberg

@bvdberg I also think my solution can be improved upon, but I ran smack into the problem with switches where you decided on opening a new scope for every case. That also means that defer must run when going from one case to another. However, in C cases are the same as labels, and labels DO NOT create a new scope. So I just want to be sure that this is what you want.

In the new solution I'll find a more elegant way of handling defers BUT I think the problem with goto is a general problem that needs to be solved anyway (especially if case does not create a scope) and it will actually serve to make the problem and analysis BETTER for future changes.

I need to free up a few hours to work on it and it will look different, so we could actually close this one and I'll open a new one later.

Nov 29 '18 10:11 lerno

@lerno I need to use nasty looking braces in case statements so often in C, I really want them out. I have (almost) never used the C feature of common scope in cases, except some very weird corner cases (see Duff's Device)..

What I'll do is add a few more unit tests to finalize the wanted behavior. After we agree on those, we can continue with the code, ok?

Nov 29 '18 10:11 bvdberg

I feel a little bad for us breaking Duff's Device 😄

Nov 29 '18 10:11 lerno

BTW I can do the change in parsing of defer anyway. Right now it assumes we can actually make the part AFTER the defer a subscope to defer. This is not really necessary (anymore). It was required in the first draft only. So I'll rewrite that and it will make the implementation cleaner. I'll also see if I can use the normal scope instead of having a parallell one. It should be possible when I don't do the weird parsing of defer.

Nov 29 '18 10:11 lerno

How about this 'specification':

Design:
    - dont allow goto/return/continue/labels/defer(😄) in defer
    - only allow break if defer has body, otherwise not
    - goto/longjmp don't call defers. (give error on both if jumping from scope with defers)
    - run defer at any scope end (also case)
    - Q: dont allow Decls inside defer? (not needed for cleanup?)
Impl:
    - if (x) defer close(x) -> error, superfluous defer, probably unwanted

Nov 29 '18 10:11 bvdberg

I want defer to work as seamless as possible. The only thing we rule out in defer is non-local jumps in defer. That means that return/continue is out (as they indicate non-local jumps), goto / label is ok as long as they are local. (Continue will also obviously work if in a subscope where continue is allowed, like a for-loop)

Longjmp will not call defer. In all other situations it is called. I don't think we need to do any errors on

if (x) defer close(x) should be an error yes. But the rule is more generic: We give an error on any scope that ONLY contains a defer AND a warning when defer is the last statement in a scope.

Some examples

if (x) { defer close(foo); } // Error

{
  defer close(foo); // Error
}

{
  LOCK *lock = acquire_lock();
  defer close(lock); // warning
}

switch (foo) {
  case 1:
    LOCK *lock = acquire_lock();
    defer close(lock); // warning
  case 2:
    defer close(file); // error
  case 3:
    ...
}     

for (int i = 0; i < 10; i++) {
  FILE* f = open_file(file[i]);
  ...
  defer close(f); // warning
}

Nov 29 '18 10:11 lerno

Sure, is the analyser code do-able for that (goto/label analysis)?

You're right about the last statement as well. Since the only statement is also the last one then, we only have to check if it's the last one..

Nov 29 '18 11:11 bvdberg

goto/label analysis in defer is already done and works in this version already.

Nov 29 '18 11:11 lerno