Bend
Bend copied to clipboard
IO/FS/read_line never ends when reading last line with bend run-c
Reproducing the behavior
It seems that the function have a problem reading the value from the last line, maybe because it cant find the "\n" it searches for.
Example of source file used, (there are 3 lines, the last line is not empty):
41
23
54
If I run this program using "bend runc-c":
def main:
with IO:
fd <- IO/FS/open("../datasets/sequence_value_issue.txt", "r")
bytes <- IO/FS/read_line(fd)
bytes <- IO/FS/read_line(fd)
# bytes <- IO/FS/read_line(fd)
txt = Bytes/decode_utf8(bytes)
return txt
I get:
Result: "23"
If I uncomment the last read_line, calling read 3 times, the program seems to never end.
It seems it have other behaviors for "bend run" for example.
System Settings
- HVM: 2.0.21
- Bend: 0.2.36
- OS: Windows 11 WSL 2 with Ubuntu 22.04.4 LTS
- CPU: Intel i5-1235U
- RAM: 32 GB -> 26GB to WSL
Additional context
I guess the change can be made in the builtin's. Could change the generic Bytes/split_once to a Bytes/split_next_line that uses a condition to check for '\n' and EOF, but I didn't quite catch how that would work here:
Bytes/split_once xs cond = (Bytes/split_once.go xs cond @x x)
Bytes/split_once.go List/Nil cond acc = (Result/Err (acc List/Nil))
Bytes/split_once.go (List/Cons x xs) cond acc =
if (== cond x) { # <====== Here could have something like: (+ ( == '\n' x) ( == EOF x))
(Result/Ok ((acc List/Nil), xs))
} else {
(Bytes/split_once.go xs cond @y (acc (List/Cons x y)))
}
def IO/FS/read_line(fd):
return IO/FS/read_line.read_chunks(fd, [])
def IO/FS/read_line.read_chunks(fd, chunks):
with IO:
# Read line in 1kB chunks
chunk <- IO/FS/read(fd, 1024)
match res = Bytes/split_once(chunk, '\n'):
case Result/Ok:
(line, rest) = res.val
(length, *) = List/length(rest)
* <- IO/FS/seek(fd, to_i24(length) * -1, IO/FS/SEEK_CUR)
chunks = List/Cons(line, chunks)
bytes = List/flatten(chunks)
return wrap(bytes)
case Result/Err:
line = res.val
chunks = List/Cons(line, chunks)
return IO/FS/read_line.read_chunks(fd, chunks)
I don't think this is a bug, in the unix standard every line should terminate with a newline character
Make sense, but in this case, using "bend run-c", it just runs forever consuming 100% of all CPU cores.
Maybe adding some checking and returning and error or some kind of ".../Nil" value could be more user friendly. But I can see how this checking for each line or character could significantly decrease performance for some use cases.
I don't think this is a bug, in the unix standard every line should terminate with a newline character
But if we reach an EOF it should also stop, no? Isn't that how other languages handle that?
But if we reach an EOF it should also stop, no? Isn't that how other languages handle that?
Yes, I think the problem here is that the Result/Err returned by split_once needs to be interpreted for both cases when we are reading a line with +1024 characters or when it reached EOF
So, I think the simplest solution is just to just check if SEEK_CUR exceeds total file size(which can be a single call to the underlying File System.