mal icon indicating copy to clipboard operation
mal copied to clipboard

Self-host issues related to 'not' or '=' (fantom, haxe, io, nasm, powershell, r, rexx and vala)

Open kanaka opened this issue 4 years ago • 13 comments

@asarhaddon had a modification to how self-hosting works (captured here: https://github.com/kanaka/mal/pull/400/commits/809d74cba78643b1a452fe80e810cc3c0ce4f4e6) that triggered some issues in fantom, haxe, io, nasm, powershell, r, rexx and vala. Here is the Travis self-hosted test where the issue manifested: https://travis-ci.org/kanaka/mal/builds/558757626

kanaka avatar Jul 15 '19 18:07 kanaka

The MAL implementation has been changed in order to hide these issues. A proper fix has been found for:

  • [ ] fantom
  • [ ] haxe
  • [x] io
  • [x] nasm (unrelated with not and =)
  • [x] powershell (34de12452a979a69e0a6ba120922e84d3411862b)
  • [ ] r
  • [x] rexx (82bc78eb4342737a719c30b402b5ce24d1e010e8)
  • [x] vala (unrelated with not and =)

asarhaddon avatar Jul 15 '19 19:07 asarhaddon

Strangely, the vala issue appears to go away if you make this change:

diff --git a/mal/step4_if_fn_do.mal b/mal/step4_if_fn_do.mal
index 05297be..0580ec3 100644
--- a/mal/step4_if_fn_do.mal
+++ b/mal/step4_if_fn_do.mal
@@ -74,6 +74,7 @@
 
 ;; core.mal: defined directly using mal
 (map (fn* [data] (apply env-set repl-env data)) core_ns)
+(env-set repl-env '*ABC* [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19])
 
 ;; core.mal: defined using the new language itself
 (rep "(def! not (fn* [a] (if a false true)))")

If you remove any number from the list then the assertion errors re-appear and things don't work correctly. It appears to me that the issue appears when the number of mal memory objects allocated is below a certain threshold. @sgtatham any insight you can give on why this is happening and how to fix it? Here is what happens if you run without the addition above:

$ make DOCKERIZE=1 MAL_IMPL=vala repl^mal^step4
docker run -it --rm -u 1000 -v /data/joelm/personal/programming/mal-other/:/mal -w /mal/mal kanaka/mal-test-mal make step4_if_fn_do.mal
make: 'step4_if_fn_do.mal' is up to date.
REPL implementation mal, step file: mal/step4_if_fn_do.mal
Running: docker run -e STEP=step4_if_fn_do -e MAL_IMPL=vala -it --rm -u 1000 -v /data/joelm/personal/programming/mal-other/:/mal -w /mal/vala kanaka/mal-test-vala ../mal/run 

(process:1): GLib-GObject-CRITICAL **: 20:34:54.160: g_object_ref: assertion 'G_IS_OBJECT (object)' failed
mal-user> (list)
Uncaught exception: 'list' not found
mal-user> 

kanaka avatar Jul 15 '19 20:07 kanaka

I noted this in #400 but I'm adding it here since this ticket is tracking the issues now:

The nasm failure appears to be because the step9_try.mal file is exactly 4096 bytes. Adding or removing a character anywhere in the file fixes the issue (likewise, padding step1 to 4096 makes the problem happen there too).

@bendudson can you take a look at why nasm crashes when trying to do load-file of a file that is exactly 4096 bytes?

make repl^nasm
(load-file "../mal/stepA_mal.mal")
Mal [nasm-mal]
nil
mal-user> (+ 2 3)
5
user> (load-file "../mal/step1_read_print.mal") ;; padded to exactly 4096 bytes
Error: Run out of memory for Array objects. Increase heap_array_limit.

kanaka avatar Jul 15 '19 20:07 kanaka

The vala issue is reproducible with

(load-file "../mal/env.mal")
(load-file "../mal/core.mal")
(println "  This should not be empty: " core_ns)

Printing core_ns at the end of core.mal works as expected.

asarhaddon avatar Jul 15 '19 20:07 asarhaddon

I haven't had time to debug the Vala failure properly yet, but I've at least identified the cause of the GLib assertion-failure message. That happens in Mal.Vector, and the cause is that one of the references stored in the vector object turns out to be unexpectedly null, which I think must be because the garbage collector has either deliberately freed it, or accidentally dropped its reference to it on the floor.

In an attempt to narrow the problem down, I did a git bisect with the test input files fixed, changing only the Vala implementation itself. That pointed the finger at commit 26ced15b31c6ebfd77c7297a7f8d346ff08c3f9b, which moved a handful of standard functions out of the Vala source code into files in the lib subdirectory. In the process it changed the definitions slightly, but that doesn't seem to be the cause, because when I edited the source files in lib to contain the defintiions exactly as I had them in stepA_mal.vala, I still got the failure. So it seems to vary with where the definitions are, not what they are. Hmmm.

I think my best guess would be that somewhere in the chain of calls starting at load-file there's a missing GC.Root, and its absence is causing something vital to get GCed that shouldn't be. But I doubt I'll find it by staring at the code and guessing: it's going to be a matter of identifying a specific thing that shouldn't be freed, and then finding out which GC call is freeing it anyway and where in the code that's called from...

sgtatham avatar Jul 15 '19 21:07 sgtatham

@sgtatham valgrind might be able to help you find it. It might at least confirm what type of thing is being leaked and maybe even where it is happening.

kanaka avatar Jul 15 '19 23:07 kanaka

Unfortunately, vala is too smart to trigger valgrind errors! If the code had actually used the value after free, valgrind would have helpfully told me where it was freed. But actually the Vala / GLib runtime notices just in time, and doesn't.

As it turned out, the important point was not where the value was freed, but where it was allocated, because that was where I should have added the missing GC.Root. #421 should fix it.

sgtatham avatar Jul 16 '19 20:07 sgtatham

@sgtatham Good work. Thanks.

kanaka avatar Jul 16 '19 20:07 kanaka

I'm looking at io.

dubek avatar Jul 18 '19 08:07 dubek

For me make DOCKERIZE=1 MAL_IMPL=io test^mal^step8 passes OK on master (03a2a696791970a464a9a757ef968d89436f5ca8). Has something changed? Since the travis build ran?

dubek avatar Jul 18 '19 08:07 dubek

@dubek sorry, to be clear, the bugs are revealed by a prior modification to the mal implementation that never made it to master. A version was merged that worked around the problem. Here is the simple diff that should reproduce the issue:

--- a/mal/core.mal
+++ b/mal/core.mal
@@ -1,11 +1,11 @@
 (def! _map? (fn* [x]
   (if (map? x)
-    (not (contains? x :__MAL_MACRO__))
+    (not (= (keys x) '(:__MAL_MACRO__)))
     false)))
 
 (def! _macro? (fn* [x]
   (if (map? x)
-    (contains? x :__MAL_MACRO__)
+    (= (keys x) '(:__MAL_MACRO__))
     false)))
 
 (def! core_ns

Will trigger this behavior:

$ make DOCKERIZE=1 MAL_IMPL=io test^mal^step8
...
Testing trivial macros
TEST: '(defmacro! one (fn* () 1))' -> ['',] -> SUCCESS (result ignored)
TEST: '(one)' -> ['',1] -> FAIL (line 4):
    Expected : '\\(one\\)\r\n1'
    Got      : "(one)\r\nUncaught exception: MalMap does not respond to 'call'"
...

kanaka avatar Jul 18 '19 14:07 kanaka

The following extra step4 tests find a bug in the Io implementation that eventually causes this self-hosting bug:

diff --git a/tests/step4_if_fn_do.mal b/tests/step4_if_fn_do.mal
index 0237d227..2d37b57d 100644
--- a/tests/step4_if_fn_do.mal
+++ b/tests/step4_if_fn_do.mal
@@ -431,6 +431,8 @@ nil
 ;=>false
 (= :abc ":abc")
 ;=>false
+(= (list :abc) (list :abc))
+;=>true

 ;; Testing vector truthiness
 (if [] 7 8)
@@ -465,6 +467,8 @@ nil
 ;=>true
 (= [7 8] [7 8])
 ;=>true
+(= [:abc] [:abc])
+;=>true
 (= (list 1 2) [1 2])
 ;=>true
 (= (list 1) [])

Now I'll try to hunt it down.

dubek avatar Jul 20 '19 18:07 dubek

From my local tests #425 should solve the self-hosting issue in io .

dubek avatar Jul 21 '19 08:07 dubek