eio icon indicating copy to clipboard operation
eio copied to clipboard

Expose Path.size and fix Path.load

Open haesbaert opened this issue 3 years ago • 2 comments

Path.load was doubling the buffer and trying to fit enough until it saw an EOF. This would cause many unecessary large allocations and my poor 16GB ram laptop couldn't even load a 5GB file without getting OOMed.

This diff makes load create an initial buffer of the size of the file and load exactly that amount of bytes in it.

Also, using take_all here isn't a good idea since it will resize even if initial_size (capacity) is adequate, that's because it does: ensure t.len(0) (0+1), which ends up allocating another bigarray.

haesbaert avatar Oct 05 '22 15:10 haesbaert

Let me know what you think, I've added two versions of Buf_read.load, the problem with using read_exact is that if the file shrinks mid-read we just throw an EOF, the second issue is that due to the integer overflow in luv, read_exact would end up always giving an EOF. So I've added both versions, one that copes with getting an early EOF, and one that doesn't. I've fixed the Uring fstat to use the Large version as well. If you don't want to expose size as you mentioned before, we have to decide what to put and how to export stat, since we would need to see what Luv offers in windows and so on.

haesbaert avatar Oct 05 '22 18:10 haesbaert

This is ready for review, let me know what you think.

haesbaert avatar Oct 12 '22 22:10 haesbaert

This is a bit in WIP, I think I've addressed all points but I'm not too sure about File.stat

haesbaert avatar Oct 18 '22 17:10 haesbaert

Had to step back a bit on this diff, I've rebased and force-pushed, from the new commit message: I apologize for wasting everyone's time and radically changing it, but the whole luv bits were too insane for my taste, see details below. I might be doing the whole interface/OO thing wrong and would appreciate some pointers here.

    Expose `stat` and improve Path.load

    `stat` can be called on files and flows in order to discover what is the backing
    descriptor. Some notes:
      - Use Unix.Largestat instead of Unix.stat
      - Add a portable stat for the user
      - Moved Path.Unix_perm into File.Unix_perm to avoid cyclical ref

    A previous attempt of this diff tried to make proper use of Luv.File.Stat but
    there were multiple issues: `mode` was not properly exported so I had to add a
    discovery.ml; it only works for a subset of file types and we had to manually
    convert timestamps to float; all doable but an insufferable amount of code to end
    up with less functionality than just calling into Unix.Largestat.stat.

    There are still some kinks:
      - Luv will always follow a link, so we can't really stat on a symbolic link,
        this is because we rely on realpath(3) which resolves the link.
      - We can't open named pipes in with luv atm, so we also can't stat it (#357).
      - I did test all file-modes, but didn't add all to the test since they are
        impractical (opening /dev/tty for Character_special...)

haesbaert avatar Oct 27 '22 10:10 haesbaert

I apologize for wasting everyone's time and radically changing it, but the whole luv bits were too insane for my taste, see details below.

Sounds good. Do we really need to add it to the socket types, though? It doesn't seem useful for anything.

Frankly, can't see much uses, and can only remember one time I relied on it, I've added it more for the sake of correctness/completeness, and to use it as an excuse to learn some ocaml OO. I can gut it if it's the case.

haesbaert avatar Oct 27 '22 11:10 haesbaert

Frankly, can't see much uses, and can only remember one time I relied on it, I've added it more for the sake of correctness/completeness, and to use it as an excuse to learn some ocaml OO. I can gut it if it's the case.

Yes please. It will be a pain for e.g. Mirage having to support fstat.

talex5 avatar Oct 27 '22 11:10 talex5

Frankly, can't see much uses, and can only remember one time I relied on it, I've added it more for the sake of correctness/completeness, and to use it as an excuse to learn some ocaml OO. I can gut it if it's the case.

Yes please. It will be a pain for e.g. Mirage having to support fstat.

Ack, leme know if this is ok and then I'll squash and merge.

haesbaert avatar Oct 27 '22 15:10 haesbaert