elvm
elvm copied to clipboard
EOF is not working as intended
While I was trying to investigate some issues with fgets
implementation, I've noticed that the source of the issue is EOF definition itself.
int c = getchar();
if (c < 0) {
printf("%x\n", c);
printf("%x\n", EOF);
}
After running the code snippet above, I have got the following result:
-1
ffffff
As you can see, EOF is 24-bit instead of 32-bit. Therefore, if (c == EOF)
is always false. This can be seen in the generated EIR code as well:
16777215 is simply 0xffffff. This causes some bugs. Even though interpreting the EIR output with
eli
works fine for some cases, it still fails if you try to convert EIR to Whitespace for instance.
Since I haven't been able to pinpoint issue yet, I have made temporary fix by changing the condition from c == EOF
to c < 0
. I will make a pull request soon.
By the way, printf
might have issues with hexadecimal representation as well since it literally printed back -1
instead of ffffffff
.
ELVM has a stupid ABI. All integer types are unsigned and have the same size (i.e., sizeof(char) == sizeof(int) == 1
). 1 "byte" is 24 bits in most backends (everything is originally designed to run brainfuck compiled from C fast). So in ELVM, EOF is 0xffffff, I agree it sounds very strange, though. There is even a test: https://github.com/shinh/elvm/blob/master/test/eof.c
Since there is no negative number, if (c < 0)
is not a right condition and it should be if (c == EOF)
. If it does not work in some backends, it's a bug of the backend translator.
#include <stdio.h>
int main() {
int c = getchar();
printf("%x\n", c);
printf("%x\n", EOF);
}
showed
ffffff
ffffff
by eli
, elc -c
, and elc -rb
. If -1
is somehow shown, the whitespace backend has an issue? For some reason, whitespace is not working in my environment right now, so haven't confirmed it yet.
I see. I have confirmed that the issue is related to whitespace backend. If I run the code with eli
, it prints ffffff
, but if I run it with the whitespace interpreter, it prints -1
instead.
By the way, is there a reason fgets
return type is int
instead of char*
? Also fgets
doesn't use the given fp
variable, it acts like gets
with a character limit. I am aware that most backends don't support file i/o operations. Still, can't we call fgetc(fp)
instead of getchar()
without breaking anything?
Wow, I haven't realized fgets
is returning int
! It's definitely a bug. It looks like there is no user of this function and that's why this issue wasn't found. I also agree fgets
should call fgetc
for the EOF hack for whitespace in fgetc
.
I think https://github.com/shinh/elvm/pull/97 fixes the both of the issues? Thanks for pointing this out!
You are so fast! Yes, this fixes both. Actually, it fixes another bug that I haven't mentioned yet :D. The previous version was copying EOF to buffer, just like \n
. Also, it seems we both fixed it almost the same way. My local patch was like this:
The only difference in our approaches, your version does nothing and return null pointer if size is 1. Mine puts a null byte at the beginning of the buffer and returns the buffer. I have referred to this implementation to decide that.
Ah, yours is better! Could you send a PR to re-correct the implementation? If you don't have time for it, I'll do it this night.
Sure! By the way, there is one more bug. All i/o functions stop when they encounter a null byte in ELVM's implementation. I don't know the reason, but it seems getc
, getchar
, fgetc
are all returning EOF instead of 0 for null byte. Since we use them to implement fgets
, it also cannot read an input such as "foo\x00bar"
. Maybe, the bug is related to 8cc
.
On a side note, using fgetc
in fgets
does not fix the issue with whitespace backend. So, there is still something wrong about it.
Yeah, it's unfortunate we cannot distinguish an EOF and a null byte and it's impossible to handle binary files without preprocessing.
If it's a bug rather than a limitation, it's due to the definition of ELVM's GETC instruction. When it reaches to EOF, ELVM implementation should fill 0 instead of -1: https://github.com/shinh/elvm/blob/c30d33fe2c3ffa182f3e502b365cbea9267ffed8/ir/eli.c#L169
I think I chose this definition because some backend esolangs do not distinguish EOF from 0 anyway. For example, many brainfuck implementation fills -1 or 0 when ,
reaches to EOF: https://esolangs.org/wiki/Brainfuck#EOF I think current BF backend supports both -1 and 0 (https://github.com/shinh/elvm/blob/c30d33fe2c3ffa182f3e502b365cbea9267ffed8/target/bf.c#L529), but this means programs compiled to BF cannot handle 0 and 255 in binary files.
Some backends have more limitations in their inputs. Please check https://github.com/shinh/elvm/blob/master/tools/runtex.sh for example. It converts /dev/stdin
to another format.
Oh I see. Maybe we can choose some defaults and provide other stuff as options while converting to backends. For example, we can set EOF as -1. However, the user should be able to provide a flag to change it if he wants such as:
out/elc -bf -eof=0 infile.eir > outfile.bf
Still, it might make things too complicated. I am not sure if that's worth it, just an idea.
I think fgets
was still wrong. I didn't expect fgets
has such a corner case :) We should have returned nullptr if nothing is read while size > 1
: https://github.com/shinh/elvm/pull/99
As for -eof=0
, it makes sense, but I'd rather revisit the definition of GETC
instruction if I have time.
This looks good to me.
Should I create another issue for the Whitespace backend? So, you said that all numbers should be 24-bit and unsigned. Doesn't this contradict with Whitespace language itself? From my understanding, the Whitespace language should be able to use negative numbers and there is no bit limit for the numbers. So, I should be able to work with a 64-bit number.
The EIR to WS translator should have no issue as it emits code which limits integers to 24bit range: https://github.com/shinh/elvm/blob/18b45bc85e27a862a9367087ccbab27631a0ee96/target/ws.c#L145 I agree it does not fully utilize capability of whitespace language, but the translation should be valid.
It's fine to define a 64bit or 32bit dialect of ELVM IR and support a mode like elc -ws -64bit
. As for whitespace, I guess removing (x+(1<<24))%(1<<24)
I mentioned and setting 1<<24
to SP to keep 24 bit address space would work? I think the initial value of SP is set here: https://github.com/shinh/elvm/blob/18b45bc85e27a862a9367087ccbab27631a0ee96/target/ws.c#L232
I have just tested it again with our latest patch, it already works as intended. Both whitespace interpreter and eli
shows that the buffer is filled with zeros instead of -1
s. I guess switching from getchar
to fgetc
does solved the issue since there is a whitespace special trick in fgetc
implementation.
Also, you are right that the translation is still valid even though numbers are 24-bit. I guess everything is working now? :D
Oh sorry, the bug is still there. I forgot that it works fine if you provide \n
in the input. I will try to find a solution. I just don't understand how we get -1
if all numbers are unsigned.
By the way, tests for Whitespace backend crashes. This might be related to the issue.
➜ make test-ws
Skip building kx due to lack of kinx
Skip building wasi due to lack of wasmtime
Skip building wasm due to lack of wat2wasm
Skip building el due to lack of emacs
Skip building cl due to lack of sbcl
Skip building scala due to lack of scalac
Skip building swift due to lack of swiftc
Skip building cr due to lack of crystal
Skip building cs due to lack of
Skip building i due to lack of ick
Skip building forth due to lack of gforth
Skip building fs due to lack of
Skip building lua due to lack of lua
Skip building ll due to lack of lli
Skip building scm_sr due to lack of gosh
Skip building hs due to lack of ghc
Skip building oct due to lack of octave
Skip building scratch3 due to lack of
Skip building j due to lack of jconsole
./runtest.sh out/elc.c.eir.ws.out tools/runws.sh out/elc.c.eir.ws
Buffer overflow!
make: *** [build.mk:5: out/elc.c.eir.ws.out] Error 1
As for the "Buffer overflow!", I believe I found the cause and fixed it by: https://github.com/shinh/elvm/commit/8d7508322ce71f9e95fa2ba84fbe983f214b9f07 You probably need to re-build whitespace interpreter by touch Whitespace/whitespace.c && make test-ws
As for -1, perhaps https://github.com/shinh/elvm/pull/100 may fix the issue?
Cool! Both are fixed now. There is no overflow and -1
s are clearly gone. Great job mate, thanks!