Bend icon indicating copy to clipboard operation
Bend copied to clipboard

IO/FS/read_line never ends when reading last line with bend run-c

Open ArthurMPassos opened this issue 1 year ago • 5 comments

Reproducing the behavior

It seems that the function have a problem reading the value from the last line, maybe because it cant find the "\n" it searches for.

Example of source file used, (there are 3 lines, the last line is not empty):

41
23
54

If I run this program using "bend runc-c":

def main:
  with IO:
    fd <- IO/FS/open("../datasets/sequence_value_issue.txt", "r")

    bytes <- IO/FS/read_line(fd)
    bytes <- IO/FS/read_line(fd)
    # bytes <- IO/FS/read_line(fd)

    txt = Bytes/decode_utf8(bytes)
    return txt

I get:

Result: "23"

If I uncomment the last read_line, calling read 3 times, the program seems to never end.

It seems it have other behaviors for "bend run" for example.

System Settings

  • HVM: 2.0.21
  • Bend: 0.2.36
  • OS: Windows 11 WSL 2 with Ubuntu 22.04.4 LTS
  • CPU: Intel i5-1235U
  • RAM: 32 GB -> 26GB to WSL

Additional context

I guess the change can be made in the builtin's. Could change the generic Bytes/split_once to a Bytes/split_next_line that uses a condition to check for '\n' and EOF, but I didn't quite catch how that would work here:

Bytes/split_once xs cond = (Bytes/split_once.go xs cond @x x)
  Bytes/split_once.go  List/Nil        cond acc = (Result/Err (acc List/Nil))
  Bytes/split_once.go (List/Cons x xs) cond acc =
    if  (== cond x) { # <====== Here could have something like: (+ ( == '\n' x) ( == EOF x))
      (Result/Ok ((acc List/Nil), xs))
    } else {
      (Bytes/split_once.go xs cond @y (acc (List/Cons x y)))
    }
def IO/FS/read_line(fd):
  return IO/FS/read_line.read_chunks(fd, [])

def IO/FS/read_line.read_chunks(fd, chunks):
  with IO:
    # Read line in 1kB chunks
    chunk <- IO/FS/read(fd, 1024)
    match res = Bytes/split_once(chunk, '\n'):
      case Result/Ok:
        (line, rest) = res.val
        (length, *) = List/length(rest)
        * <- IO/FS/seek(fd, to_i24(length) * -1, IO/FS/SEEK_CUR)
        chunks = List/Cons(line, chunks)
        bytes = List/flatten(chunks)
        return wrap(bytes)
      case Result/Err:
        line = res.val
        chunks = List/Cons(line, chunks)
        return IO/FS/read_line.read_chunks(fd, chunks)

ArthurMPassos avatar Jul 19 '24 23:07 ArthurMPassos

I don't think this is a bug, in the unix standard every line should terminate with a newline character

imaqtkatt avatar Jul 23 '24 13:07 imaqtkatt

Make sense, but in this case, using "bend run-c", it just runs forever consuming 100% of all CPU cores.

Maybe adding some checking and returning and error or some kind of ".../Nil" value could be more user friendly. But I can see how this checking for each line or character could significantly decrease performance for some use cases.

ArthurMPassos avatar Jul 23 '24 14:07 ArthurMPassos

I don't think this is a bug, in the unix standard every line should terminate with a newline character

But if we reach an EOF it should also stop, no? Isn't that how other languages handle that?

developedby avatar Jul 23 '24 14:07 developedby

But if we reach an EOF it should also stop, no? Isn't that how other languages handle that?

Yes, I think the problem here is that the Result/Err returned by split_once needs to be interpreted for both cases when we are reading a line with +1024 characters or when it reached EOF

imaqtkatt avatar Jul 23 '24 14:07 imaqtkatt

So, I think the simplest solution is just to just check if SEEK_CUR exceeds total file size(which can be a single call to the underlying File System.

coder3112 avatar Aug 09 '24 20:08 coder3112