Allocation made in match-ref branch do not get freed properly
When we create a value in the branch of a match-ref, delete is not called on that value.
Example with a string:
(deftype Sum One Two)
(defn main []
(let [state @"Ok" sumt &(Sum.One)]
(match-ref sumt
Sum.One (println* &@&state)
Sum.Two ())))
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
/* let */ {
static String _6 = "Ok";
String *_6_ref = &_6;
String _7 = String_copy(_6_ref);
String state = _7;
Sum _11 = Sum_One();
Sum* _12 = &_11; // ref
Sum* sumt = _12;
if(sumt->_tag == Sum_One_tag) {
Sum* _15_temp = sumt;
// Case expr:
String* _24 = &state; // ref
String _25 = String_copy(_24);
String* _26 = &_25; // ref
String _27 = String_str(_26);
String* _28 = &_27; // ref
IO_println(_28);
}
else if(sumt->_tag == Sum_Two_tag) {
Sum* _15_temp = sumt;
// Case expr:
/* () */
}
else {
// This will not be needed with static exhaustiveness checking in 'match' expressions:
fprintf(stderr, "Unhandled case in 'match' expression at line 7, column 6 in '/home/tim/dev/sandbox/matc/main.carp'\n");
exit(1);
}
String_delete(state);
Sum_delete(_11);
}
return 0;
}
Example with a lambda:
(deftype Sum One Two)
(defn main []
(let [state 0]
(match-ref &(Sum.One)
Sum.One (println* ((fn [] @&state)))
Sum.Two ())))
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
/* let */ {
int state = 0;
Sum _10 = Sum_One();
Sum* _11 = &_10; // ref
if(_11->_tag == Sum_One_tag) {
Sum* _11_temp = _11;
// Case expr:
// This lambda captures 1 variables: state
_Lambda_NAKED_LAMBDA_23_env *_23_env = CARP_MALLOC(sizeof(_Lambda_NAKED_LAMBDA_23_env));
_23_env->state = state;
Lambda _23 = {
.callback = (void*)_Lambda_NAKED_LAMBDA_23,
.env = _23_env,
.delete = (void*)_Lambda_NAKED_LAMBDA_23_env_delete,
.copy = (void*)_Lambda_NAKED_LAMBDA_23_env_copy
};
int _24 = _23.env ? ((int(*)(LambdaEnv))_23.callback)(_23.env) : ((int(*)())_23.callback)();
String _25 = Int_str(_24);
String* _26 = &_25; // ref
IO_println(_26);
}
else if(_11->_tag == Sum_Two_tag) {
Sum* _11_temp = _11;
// Case expr:
/* () */
}
else {
// This will not be needed with static exhaustiveness checking in 'match' expressions:
fprintf(stderr, "Unhandled case in 'match' expression at line 7, column 4 in '/home/tim/dev/sandbox/matc/main.carp'\n");
exit(1);
}
Sum_delete(_10);
}
return 0;
}
Might be caused by #876
Sorry! I think you are right about #876. I added these 2 lines to emitCaseEnd: https://github.com/leblowl/Carp/blob/59758c33f82c9a6c8cf81fece3caaa1adfc0d64b/src/Emit.hs#L374-L375 which causes it to only delete variables when we match by value. But I don't think that's the only time we want to delete variables in a case expression. I think we want to delete variables in a case expression when we match by value AND by ref. However, addressing #843, we do not want to delete a reference passed to the match. I hope that makes sense and my understanding is correct.
Looking at the code more, I am wondering why the reference (e.g. in #843) is added to the deleters in the first place. Maybe the issue for #843 is somewhere in here: https://github.com/carp-lang/Carp/blob/master/src/Memory.hs#L334
Yeah, you're right - we need to clean up in the end of the branch no matter if it's a match/match-ref. So we need some more clever fix to the original bug it seems.
And yeah, seems like #843 should be solved by not adding the deleter in the first place, I think you're definitely on the right track.
So I think I have a solution to this problem and #843 ... after removing the broken code from #876, I've added this: https://github.com/leblowl/Carp/commit/fa28bef6d1342243ebbdd26cd3b29b630c8ad35c ... basically, when managing memory, in a match case, if the symbol on the left hand side has a name that starts with "wildcard_" then we skip memory management for that symbol. This follows a similar pattern already in place for references:
visitCaseLhs (XObj Ref _ _) =
pure (Right [])
I don't really know my way around the codebase much yet or understand much of
the underlying design, so I am guessing a bit. But another solution might have
to do with isVarName, used in another existing visitCaseLhs on this line:
https://github.com/carp-lang/Carp/blob/bd553fb78e6bfe1def7976176cfff889306b6e20/src/Memory.hs#L437
Perhaps isVarName could return false when given a wildcard name? I don't know
if that makes sense semantically. The docstring for isVarName states "Is this
symbol name appropriate for a normal variable (i.e. NOT a type name or sumtype
tag)"
(https://github.com/carp-lang/Carp/blob/bd553fb78e6bfe1def7976176cfff889306b6e20/src/Obj.hs#L1077).
Is a wildcard name appropriate for a normal variable?
If isVarName returned false when given a wildcard name, then no new
visitCaseLhs would need to be added specifically to handle the wildcard, which
might be cleaner/simpler.
Also this is probably a silly question, but this is the first compiler I've worked on and I am wondering, why are we creating a wildcard symbol in the first place? What purpose does it serve?
Here is the xobj for the wildcard symbol for reference:
XObj {xobjObj = Sym wildcard_30 (LookupLocal NoCapture),
xobjInfo = Just (Info {infoLine = 8, infoColumn = 6, infoFile = "REPL",
infoDelete = Set {unSet = fromList []},
infoIdentifier = 30}),
xobjTy = Just Example}
Also, the way this issue manifests in the match expression is pretty
interesting, due to this piece of code in Emit.hs:
https://github.com/carp-lang/Carp/blob/bd553fb78e6bfe1def7976176cfff889306b6e20/src/Emit.hs#L432-L439
Given this code:
(deftype Example
One
(Two [String]))
(defn main []
(let-do [ex [(Example.Two @"OKOK")]]
(match-ref (Array.unsafe-nth &ex 0)
(Example.Two s) (println* s)
_ ())))
It emits the following for the wildcard case:
...
else if(true) {
Example* _19_temp = _19;
Example* wildcard_30 = _19_temp;
/* () */
Example_delete(wildcard_30);
}
So it assigns a reference of the match variable to the wildcard variable and then deletes the wildcard variable, thus deleting the match variable. I wonder what purpose these lines of code serve. But that might be off topic for this issue.
After thinking about it more, I think I've answered a few of my questions. I think the wildcard symbol is a valid symbol and can be used just like any other symbol, right? And it makes sense that we are assigning the match variable to the wildcard variable, since the wildcard represents the match variable, that's just how a match expression works.
Also it seems like there may be times when we want to delete the wildcard, if it represents a value, not a ref, right?
So maybe the solution would be to only add the deleter for the wildcard symbol if it's not a ref? And in that case, I am not totally sure how that would look. The xobj for the wildcard says it's a Sym not a Ref. So the xobj itself doesn't tell us that information. Should it? Or maybe we look at the Match variable when visiting the match itself and pass that to visitCaseLhs so we can tell whether the the match variable is a ref or not... if it's not, we delete the wildcard.
Looking at the full emitted code for the example in my previous comment:
9206 int main(int argc, char** argv) {
9207 carp_init_globals(argc, argv); 9208 /* let */ {
9209 Array _10 = { .len = 1, .capacity = 1, .data = CARP_MALLOC(sizeof(Example) * 1) };
9210 static String _7 = "OKOK";
9211 String *_7_ref = &_7;
9212 String _8 = String_copy(_7_ref);
9213 Example _9 = Example_Two(_8);
9214 ((Example*)_10.data)[0] = _9;
9215 Array__Example ex = _10;
9216 Array__Example* _17 = &ex; // ref
9217 Example* _19 = Array_unsafe_MINUS_nth__Example(_17, 0);
9218 if(_19->_tag == Example_Two_tag) {
9219 Example* _19_temp = _19;
9220 String* s = &_19_temp->Two.member0;
9221 // Case expr:
9222 String _27 = String_str(s);
9223 String* _28 = &_27; // ref
9224 IO_println(_28);
9225 String_delete(_27);
9226 }
9227 else if(true) {
9228 Example* _19_temp = _19;
9229 Example* wildcard_30 = _19_temp;
9230 /* () */
9231 Example_delete(wildcard_30);
9232 }
9233 else {
9234 // This will not be needed with static exhaustiveness checking in 'match' expressions:
9235 fprintf(stderr, "Unhandled case in 'match' expression at line 7, column 5 in '/mnt/c/Users/tym0/dev/sandbox/match-ref/main.carp'\n");
9236 exit(1);
9237 }
9238 Array_delete__Example(ex);
9239 }
9240 }
what should this look like? Of course we don't want the wildcard deletion there, but what about the other case. Should the string delete on line 9225 be there? Is that deleting a string member of _19? Or is the string copied? I am wondering if this is a general issue with case symbols and ref matches, or just an issue with wildcards and ref matches.
Sorry for the verbosity, maybe I should use the gitter chat? But thinking about this more, it seems like a more general issue than just with wildcards. Looking at the example in the language guide:
(match-ref &might-be-a-string
(Just s) (IO.println s)
Nothing (IO.println "Got nothing"))
We wouldn't want to delete s in this case either, right? So I think the solution would be: if we are matching on a ref then do not delete any left-hand-side symbols. Does this sound right?
What this solution might look like is this: https://github.com/leblowl/Carp/commit/979c8c546675a8f67106bfa8e5c5056a4b753ac4
Here's what it looks like with various test cases:
(deftype Example
One
(Two [String]))
(defn main []
(let-do [ex [(Example.Two @"OKOK")]]
(match-ref (Array.unsafe-nth &ex 0)
(Example.Two s) (println* s)
_ ())))
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
/* let */ {
Array _10 = { .len = 1, .capacity = 1, .data = CARP_MALLOC(sizeof(Example) * 1) };
static String _7 = "OKOK";
String *_7_ref = &_7;
String _8 = String_copy(_7_ref);
Example _9 = Example_Two(_8);
((Example*)_10.data)[0] = _9;
Array__Example ex = _10;
Array__Example* _17 = &ex; // ref
Example* _19 = Array_unsafe_MINUS_nth__Example(_17, 0);
if(_19->_tag == Example_Two_tag) {
Example* _19_temp = _19;
String* s = &_19_temp->u.Two.member0;
// Case expr:
String _27 = String_str(s);
String* _28 = &_27; // ref
IO_println(_28);
String_delete(_27);
}
else if(true) {
Example* _19_temp = _19;
Example* wildcard_30 = _19_temp;
/* () */
}
else UNHANDLED("REPL", 6);
Array_delete__Example(ex);
}
return 0;
}
(deftype Sum One Two)
(defn main []
(let [state @"Ok" sumt &(Sum.One)]
(match-ref sumt
Sum.One (println* &@&state)
Sum.Two ())))
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
/* let */ {
static String _6 = "Ok";
String *_6_ref = &_6;
String _7 = String_copy(_6_ref);
String state = _7;
Sum _11 = Sum_One();
Sum* _12 = &_11; // ref
Sum* sumt = _12;
if(sumt->_tag == Sum_One_tag) {
Sum* _15_temp = sumt;
// Case expr:
String* _25 = &state; // ref
String _26 = String_copy(_25);
String* _27 = &_26; // ref
String _28 = String_str(_27);
String* _29 = &_28; // ref
IO_println(_29);
String_delete(_26);
String_delete(_28);
}
else if(sumt->_tag == Sum_Two_tag) {
Sum* _15_temp = sumt;
// Case expr:
/* () */
}
else UNHANDLED("REPL", 12);
String_delete(state);
Sum_delete(_11);
}
return 0;
}
(deftype Sum One Two)
(defn main []
(let [state 0]
(match-ref &(Sum.One)
Sum.One (println* ((fn [] @&state)))
Sum.Two ())))
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
/* let */ {
int state = 0;
Sum _10 = Sum_One();
Sum* _11 = &_10; // ref
if(_11->_tag == Sum_One_tag) {
Sum* _11_temp = _11;
// Case expr:
// This lambda captures 1 variables: state
_Lambda_NAKED_LAMBDA_24_env_ty *_24_env = CARP_MALLOC(sizeof(_Lambda_NAKED_LAMBDA_24_env_ty));
_24_env->state = state;
Lambda _24 = {
.callback = (void*)_Lambda_NAKED_LAMBDA_24_env,
.env = _24_env,
.delete = (void*)_Lambda_NAKED_LAMBDA_24_env_ty_delete,
.copy = (void*)_Lambda_NAKED_LAMBDA_24_env_ty_copy
};
int _25 = _24.env ? ((int(*)(LambdaEnv))_24.callback)(_24.env) : ((int(*)())_24.callback)();
String _26 = Int_str(_25);
String* _27 = &_26; // ref
IO_println(_27);
Function_delete__int(_24);
String_delete(_26);
}
else if(_11->_tag == Sum_Two_tag) {
Sum* _11_temp = _11;
// Case expr:
/* () */
}
else UNHANDLED("REPL", 19);
Sum_delete(_10);
}
return 0;
}
(deftype Example
One
(Two [(Array String)]))
(defn main []
(match-ref &(Just (Example.Two [@"OKOK"]))
(Just s) ()
_ ())
)
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
Array _10 = { .len = 1, .capacity = 1, .data = CARP_MALLOC(sizeof(String) * 1) };
static String _8 = "OKOK";
String *_8_ref = &_8;
String _9 = String_copy(_8_ref);
((String*)_10.data)[0] = _9;
Example _11 = Example_Two(_10);
Maybe__Example _12 = Maybe_Just__Example(_11);
Maybe__Example* _13 = &_12; // ref
if(_13->_tag == Maybe__Example_Just_tag) {
Maybe__Example* _13_temp = _13;
Example* s = &_13_temp->u.Just.member0;
// Case expr:
/* () */
}
else if(true) {
Maybe__Example* _13_temp = _13;
Maybe__Example* wildcard_18 = _13_temp;
/* () */
}
else UNHANDLED("REPL", 27);
Maybe_delete__Example(_12);
return 0;
}
(deftype Example
One
(Two [(Array String)]))
(defn main []
(match (Just (Example.Two [@"OKOK"]))
(Just s) ()
_ ())
)
Emitted C code
int main(int argc, char** argv) {
carp_init_globals(argc, argv);
Array _9 = { .len = 1, .capacity = 1, .data = CARP_MALLOC(sizeof(String) * 1) };
static String _7 = "OKOK";
String *_7_ref = &_7;
String _8 = String_copy(_7_ref);
((String*)_9.data)[0] = _8;
Example _10 = Example_Two(_9);
Maybe__Example _11 = Maybe_Just__Example(_10);
if(_11._tag == Maybe__Example_Just_tag) {
Maybe__Example _11_temp = _11;
Example s = _11_temp.u.Just.member0;
// Case expr:
/* () */
Example_delete(s);
}
else if(true) {
Maybe__Example _11_temp = _11;
Maybe__Example wildcard_16 = _11_temp;
/* () */
Maybe_delete__Example(wildcard_16);
}
else UNHANDLED("REPL", 36);
return 0;
}
So I think the solution would be: if we are matching on a ref then do not delete any left-hand-side symbols. Does this sound right?
I think this is the right way to think about it, yes.
If you see the whole match as a function, you want it to consume its argument fully, and leave it completely untouched when it's a match-ref.
Since match will "split up" the value, it needs to make sure each such sub-element is consumed/deleted within its scope.
I do not have time to read through the C code in the final post right now, but will try to do that later today!
@leblowl I've read through the emitted code now and it seems correct. There is one situation that would be good to check with your branch, something like this:
(deftype Example (A [String String]))
(defn f []
(let-do [e (Example.A @"x" @"y")]
(match e
(Example.A a b) (Unsafe.leak a))
;; <*>
))
This should delete b but leave a since it has already been consumed by leak.
Also, using e at <*> should be an error (e.g. (println* &e))
Both of these things currently hold in master.
@eriksvedang thanks! I was wondering if there are still some test cases I haven't considered.
(deftype Example (A [String String]))
(defn f []
(let-do [e (Example.A @"x" @"y")]
(match e
(Example.A a b) (Unsafe.leak a))
;; <*>
))
Emitted C code
void f() {
/* let */ {
static String _7 = "x";
String *_7_ref = &_7;
String _8 = String_copy(_7_ref);
static String _10 = "y";
String *_10_ref = &_10;
String _11 = String_copy(_10_ref);
Example _12 = Example_A(_8, _11);
Example e = _12;
if(e._tag == Example_A_tag) {
Example _16_temp = e;
String a = _16_temp.u.A.member0;
String b = _16_temp.u.A.member1;
// Case expr:
Unsafe_leak__String(a);
String_delete(b);
}
else UNHANDLED("REPL", 46);
}
}
That looks good to me.
And the error still holds:
> (defn f []
( (let-do [e (Example.B [@"x"])]
(( (match e
((( (Example.A a b) (Unsafe.leak a))
(( (println* &e)
(( ;; <*>
(( ))
You’re referencing a given-away value `e` at line 65, column 16 in 'REPL'
XObj {xobjObj = Sym e (LookupLocal NoCapture), xobjInfo = Just (Info {infoLine = 65, infoColumn = 16, infoFile = "REPL", infoDelete = Set {unSet = fromList []}, infoIdentifier = 27}), xobjTy = Just Example}
You’ll have to copy the value using `@`. at REPL:61:2.
Traceback:
(defn f [] (let-do [e (Example.B [(copy "x")])] (match e (Ex...) at REPL:61:1.
Exploring a bit, I also tried:
(deftype Example
(A [String String])
(B [(Array String)]))
(defn f []
(let-do [e (Example.B [@"x"])]
(match e
(Example.A a b) (Unsafe.leak a))
;; <*>
))
Which surprisingly doesn't cause a compile error and appears to leak:
Emitted C code
void f() {
/* let */ {
Array _9 = { .len = 1, .capacity = 1, .data = CARP_MALLOC(sizeof(String) * 1) };
static String _7 = "x";
String *_7_ref = &_7;
String _8 = String_copy(_7_ref);
((String*)_9.data)[0] = _8;
Example _10 = Example_B(_9);
Example e = _10;
if(e._tag == Example_A_tag) {
Example _14_temp = e;
String a = _14_temp.u.A.member0;
String b = _14_temp.u.A.member1;
// Case expr:
Unsafe_leak__String(a);
String_delete(b);
}
else UNHANDLED("REPL", 56);
}
}
but this is might be considered a different bug.
And then I was wondering if there is anyway to destructure nested arrays, but I guess the match destructuring is only for sum types right now?
And then I was also wondering if there is any leaking in the sum type struct itself. I couldn't figure out how to view the emitted C code for a deftype expression, but it doesn't look like there is anything malloc'd in this code: https://github.com/carp-lang/Carp/blob/bd553fb78e6bfe1def7976176cfff889306b6e20/src/Emit.hs#L842
That example will crash with a failed match so that doesn't really qualify as a leak, right? Or am I missing something :)
Destructruring arrays (and other things) would be great but should probably be added later when this is 100% solid for sum types.
It shouldn't be necessary to free anything of the struct itself, it's just the members and the tag (char). It's a little weird that it just is implicitly destroyed though. In that sense you can use match as a deleter, kinda!
Oh, I didn't actually run the function:
> (f)
Unhandled case in 'match' expression at REPL:6
[RUNTIME ERROR] 'out/Untitled' exited with return value -6.
Would it still leak inside the REPL?
Would it still leak inside the REPL?
Since the leaking program is run in a sub process, the OS will clean it up. So for now, we're good! If we ever have in-process compilation (e.g. through LLVM) this would become a problem though.