grain
grain copied to clipboard
Stdlib: user-friendly filesystem API
I'll take it.
Out of curiosity what type of methods would you guys like for this lib to have I imagine
read: (path) -> Result<string>
write: (path, data) -> Result<String>
and maybe a few other directory ones are a must but anything else for a first iteration.
In the brainstorming process for what this API might look like, I encountered a few contention points where design decisions should be made:
- Should this API be completely independent of
sys/file
from the user's perspective? For example if a user wanted to get the stats of a file should they get back aFilestats
enum fromsys/file
or should a separate type be created in the new module? (I am personally leaning towards the latter as it allows for better abstraction and independence, but it has a few disadvantages also). - How should the API handle working with file streams/channels (e.g.
open
ing a file and doing a sequence of read/write operations on it)? Possible options:- Have the
open
function return a file stream; this would be simpler to work with, but the user would then have to close the stream manually lest they be scolded by their operating system. - Have the user pass a callback function to
open
; this approach is safer, but may be a bit more annoying to work with in certain situations. - Implement some sort of automatic resource management mechanism (like
with
in Python,using
in C#, etc); this would offer the best of both worlds but I am not sure that such a feature fits in with the rest of the language.
- Have the
I would argue for using a new enum that could be more user friendly possibly, I also personally more prefer the open
approach for streams. though automatic resource management would be cool it feels like it would push having a user friendly api which is much needed far down the line. just my opinions though.
- Independent
- Why is there a streams/channels API at all when wasm can't do async things yet? I like the idea of a "file handle" type API like nodejs did with their
fs.promises.open
, etc
Why would async be required for this? Maybe I'm not using the right term but I essentially mean a file handle
Fair enough. I only think about streams in the context of async data processing (from node) but I suppose OCaml has blocking streams & channels 😛 I'd definitely like to stay away from that overloaded terminology if possible.
I've made a pull request for this. Happy to have discussion about the api 😝
I'm working on an implementation in av8ta/fs based on feedback that looks something like in this comment: https://github.com/grain-lang/grain/pull/1596#issuecomment-1380389109
export enum Error<e> {
EISDIR(e),
ENOENT(e),
.... 75 more errors (just these two currently implemented to gather feedback)
UNKNOWN(e),
}
export record PathOpenOptions {
lookupFlag: List<File.LookupFlag>,
openFlags: List<File.OpenFlag>,
rights: List<File.Rights>,
inheriting: List<File.Rights>,
rwFlags: List<File.FdFlag>,
}
I've implemented open
and readfile
to start with using FileDescriptors. That will change when I implement Fs.FileHandle. The user will interact directly with handles instead of FileDescriptors. Error(e)
are returned instead of exceptions.
// -> Result<File.FileDescriptor, Error<String>>
export let open = (path: Path.Path, options: PathOpenOptions) => {}
Recursively reads file in 1024 byte chunks. Opens & closes file descriptor.
// -> Result<String, Error<String>>
export let readFile = (path: Path.Path) => {}
Will begin working on Fs.FileHandle
while implementing the below pseudo-code. Any objections, comments feedback are greatly appreciated :)
export let readFileChunk = (handle: Fs.FileHandle, maxBytesToRead: Number) => {
// store handle if not seen previously
// read one chunk
// successive calls to readFileChunk advance
(chunk, numBytesRead)
}
export let readFileChunks = (handle: Fs.FileHandle, maxBytesToRead: Number, callback) => {
// store handle if not seen previously
// read each chunk & call back on each
// until end of file
callback((chunk, numBytesRead))
}
export let readFileSlice = (handle: Fs.FileHandle, range:Range.Range) => {
// store handle if not seen previously
// read chunk in range
(slice, numBytesRead)
}
export let readFileSlices = (handle: Fs.FileHandle, range:Range.Range, maxBytesToRead: Number, callback) => {
// store handle if not seen previously
// read chunk in range up to maxBytesToRead
// callback on each chunk
callback((chunk, numBytesRead))
}
@av8ta Why are you having readFileChunks
and readFileSlices
taking a callback?
@phated My thinking was devs might want to read files in chunks and work on them as they're streaming in. Perhaps they've got an infinite length file like a logfile that's constantly being appended to, or a file that's too large to fit in memory.
Rather than load the whole file into memory and then work on it, devs can get work done on the data immediately. Think; logfile streaming in on stdin, processing the chunks, and then writing the transformation to stdout. Loads of other usecases of course because streams are a great way to work on data.
Of course if they would rather do it in a loop then there are the singular versions readFileChunk
and readFileSlice
to use.
I also think wasm on edge compute is one of the many great usecases for webassembly and grain is already great for cli tools. Those environments can be very resource constrained.
I tried this and got a greatly improved output file :)
let writeStdout = message => File.fdWrite(File.stdout, message)
let improveCsv = string => String.replaceAll(",", "🌾\n\n", string)
let callback = string => writeStdout(improveCsv(string))
readFileChunks(Fs.stdin, 1024, ((string, num)) => callback(string))
For this first iteration, let's keep the API really small and then we can expand it from there. We'll hold off on chunks/streaming for now, since that'll play a lot into Grain's async story, but we're a ways off from that.
So for the first version, I propose this small API:
enum FileError {
...
}
readFile: Path -> Result<Bytes, FileError>
writeFile: (Path, Bytes) -> Result<Number, FileError> // number of bytes written
appendFile: (Path, Bytes) -> Result<Number, FileError>
module Utf8 {
readFile: Path -> Result<String, FileError>
writeFile: (Path, String) -> Result<Number, FileError>
appendFile: (Path, String) -> Result<Number, FileError>
}
There are some natural extensions to this, like reading/writing a range, but those can come later. This gives us a really good starting point that we can continue to build from.
This will need a long overdue update to sys/file to operate on bytes, but I'll get that change in swiftly.
Here is said change: https://github.com/grain-lang/grain/pull/1655
Looks good to me @ospencer, thanks for the feedback
@av8ta Since you made a lot of progress towards this API on your initial PR, were you still planning on writing a new iteration based on the recent feedback or would you be ok with someone else bringing it over the finish line?
@alex-snezhko I'm keen to resume with the new design. I found myself afk for a few weeks while shifting country and getting settled in, and can get back into it now.
@av8ta Sounds good! I made some progress for the new proposed API that you could use as a base if you'd like https://github.com/alex-snezhko/grain/blob/ergo-fs/stdlib/sys/fs.gr
Thanks @alex-snezhko that's really helpful
Hey @av8ta and @alex-snezhko - I'd love for this PR to be up soon so we can get as much landed for 0.6 as possible. We are probably only a couple of weeks away! 🙏
Hey @phated I've had busy weekends lately, but I've got a chunk of this one free so I'll take a look at it. Hopefully that won't be too late for 0.6 :pray:
@av8ta I've had some time to pick this back up as I wasn't sure if you had the time to get to this since your last comment. However, I see that you pushed up a commit on Oct 3rd, so I was curious if you were still looking to close this out or if you'd be fine with me making this PR? I implemented several additional functions and think I have it at a near-complete state now.
@alex-snezhko yeah go for it, I had planned to but having trouble creating the time for it.