Efficiently read part of a file by seek offsets
It would be nice to read a slice of a file, based on the start & end char indices.
chatgpt attempt, seems to work:
Yes, there is a more efficient way to handle large files by reading only the necessary parts of the file instead of slurping the entire content. You can accomplish this using Java's
java.nio.fileAPI. Here is a way to do this using interop:
(defn lazy-substring-of-file [filename start end]
(let [path (java.nio.file.Paths/get filename (into-array String []))
options (into-array java.nio.file.OpenOption [java.nio.file.StandardOpenOption/READ])
fc (.newByteChannel java.nio.file.Files path options)
bb (java.nio.ByteBuffer/allocate (- end start))]
(.position fc start)
(.read fc bb)
(.close fc)
(String. (.array bb) "UTF-8")))
In this function:
java.nio.file.Paths/getis used to get ajava.nio.file.Pathobject from the filename.java.nio.file.Files/newByteChannelis used to create a newjava.nio.channels.SeekableByteChannelto the file.java.nio.ByteBuffer/allocateis used to create aByteBufferof the right size..positionis used to set the read position of the byte channel..readis used to read the right amount of bytes from the file into theByteBuffer.String. (.array bb) "UTF-8"is used to create a new string from theByteBuffer.This function avoids reading the whole file into memory by only reading the necessary bytes. It works best when
startandendare relatively small compared to the size of the file.
Can you tell more about the use case (rather than the implementation) aside from "would be nice"?
Sure, I'm imagining a program that has to read many large files at predictable locations. For instance, media & archive headers. AFAIK, using slurp or fs/read-all-bytes could incur a high performance cost per operation, due to loading each entire file into memory. For a developer, it would be nice to have a readymade cross-platform function, and not have to delve into the host API.
That said, I've only done this sort of thing in C, so apologies if it's inaccurate or out of scope.
I'll keep this issue open to see if more people are interested. "lazily" might not be an accurate description: you want to read some specific segment from a file, without reading all of the file into memory, right?