grain icon indicating copy to clipboard operation
grain copied to clipboard

Stdlib: user-friendly filesystem API

Open ospencer opened this issue 3 years ago • 22 comments

ospencer avatar Jul 12 '20 20:07 ospencer

I'll take it.

c0d3d avatar Jul 12 '20 20:07 c0d3d

Out of curiosity what type of methods would you guys like for this lib to have I imagine

read: (path) -> Result<string>
write: (path, data) -> Result<String>

and maybe a few other directory ones are a must but anything else for a first iteration.

spotandjake avatar Nov 30 '21 22:11 spotandjake

In the brainstorming process for what this API might look like, I encountered a few contention points where design decisions should be made:

  1. Should this API be completely independent of sys/file from the user's perspective? For example if a user wanted to get the stats of a file should they get back a Filestats enum from sys/file or should a separate type be created in the new module? (I am personally leaning towards the latter as it allows for better abstraction and independence, but it has a few disadvantages also).
  2. How should the API handle working with file streams/channels (e.g. opening a file and doing a sequence of read/write operations on it)? Possible options:
    1. Have the open function return a file stream; this would be simpler to work with, but the user would then have to close the stream manually lest they be scolded by their operating system.
    2. Have the user pass a callback function to open; this approach is safer, but may be a bit more annoying to work with in certain situations.
    3. Implement some sort of automatic resource management mechanism (like with in Python, using in C#, etc); this would offer the best of both worlds but I am not sure that such a feature fits in with the rest of the language.

alex-snezhko avatar Oct 24 '22 02:10 alex-snezhko

I would argue for using a new enum that could be more user friendly possibly, I also personally more prefer the open approach for streams. though automatic resource management would be cool it feels like it would push having a user friendly api which is much needed far down the line. just my opinions though.

spotandjake avatar Oct 24 '22 02:10 spotandjake

  1. Independent
  2. Why is there a streams/channels API at all when wasm can't do async things yet? I like the idea of a "file handle" type API like nodejs did with their fs.promises.open, etc

phated avatar Oct 24 '22 21:10 phated

Why would async be required for this? Maybe I'm not using the right term but I essentially mean a file handle

alex-snezhko avatar Oct 25 '22 00:10 alex-snezhko

Fair enough. I only think about streams in the context of async data processing (from node) but I suppose OCaml has blocking streams & channels 😛 I'd definitely like to stay away from that overloaded terminology if possible.

phated avatar Oct 25 '22 01:10 phated

I've made a pull request for this. Happy to have discussion about the api 😝

av8ta avatar Jan 11 '23 10:01 av8ta

I'm working on an implementation in av8ta/fs based on feedback that looks something like in this comment: https://github.com/grain-lang/grain/pull/1596#issuecomment-1380389109

export enum Error<e> {
  EISDIR(e),
  ENOENT(e),
  .... 75 more errors (just these two currently implemented to gather feedback)
  UNKNOWN(e),
}

export record PathOpenOptions {
  lookupFlag: List<File.LookupFlag>,
  openFlags: List<File.OpenFlag>,
  rights: List<File.Rights>,
  inheriting: List<File.Rights>,
  rwFlags: List<File.FdFlag>,
}

I've implemented open and readfile to start with using FileDescriptors. That will change when I implement Fs.FileHandle. The user will interact directly with handles instead of FileDescriptors. Error(e) are returned instead of exceptions.

// -> Result<File.FileDescriptor, Error<String>>
export let open = (path: Path.Path, options: PathOpenOptions) => {}

Recursively reads file in 1024 byte chunks. Opens & closes file descriptor.

// -> Result<String, Error<String>>
export let readFile = (path: Path.Path) => {}

Will begin working on Fs.FileHandle while implementing the below pseudo-code. Any objections, comments feedback are greatly appreciated :)

export let readFileChunk = (handle: Fs.FileHandle, maxBytesToRead: Number) => {
  // store handle if not seen previously
  // read one chunk
  // successive calls to readFileChunk advance

  (chunk, numBytesRead)
}

export let readFileChunks = (handle: Fs.FileHandle, maxBytesToRead: Number, callback) => {
  // store handle if not seen previously
  // read each chunk & call back on each 
  // until end of file

  callback((chunk, numBytesRead))
}

export let readFileSlice = (handle: Fs.FileHandle, range:Range.Range) => {
  // store handle if not seen previously
  // read chunk in range

  (slice, numBytesRead)
}

export let readFileSlices = (handle: Fs.FileHandle, range:Range.Range, maxBytesToRead: Number, callback) => {
  // store handle if not seen previously
  // read chunk in range up to maxBytesToRead
  // callback on each chunk

  callback((chunk, numBytesRead))
}

av8ta avatar Jan 15 '23 04:01 av8ta

@av8ta Why are you having readFileChunks and readFileSlices taking a callback?

phated avatar Jan 16 '23 20:01 phated

@phated My thinking was devs might want to read files in chunks and work on them as they're streaming in. Perhaps they've got an infinite length file like a logfile that's constantly being appended to, or a file that's too large to fit in memory.

Rather than load the whole file into memory and then work on it, devs can get work done on the data immediately. Think; logfile streaming in on stdin, processing the chunks, and then writing the transformation to stdout. Loads of other usecases of course because streams are a great way to work on data.

Of course if they would rather do it in a loop then there are the singular versions readFileChunk and readFileSlice to use.

I also think wasm on edge compute is one of the many great usecases for webassembly and grain is already great for cli tools. Those environments can be very resource constrained.

I tried this and got a greatly improved output file :)

let writeStdout = message => File.fdWrite(File.stdout, message)
let improveCsv = string => String.replaceAll(",", "🌾\n\n", string)
let callback = string => writeStdout(improveCsv(string))

readFileChunks(Fs.stdin, 1024, ((string, num)) => callback(string))

av8ta avatar Jan 17 '23 13:01 av8ta

For this first iteration, let's keep the API really small and then we can expand it from there. We'll hold off on chunks/streaming for now, since that'll play a lot into Grain's async story, but we're a ways off from that.

So for the first version, I propose this small API:

enum FileError {
  ...
}
readFile: Path -> Result<Bytes, FileError>
writeFile: (Path, Bytes) -> Result<Number, FileError> // number of bytes written
appendFile: (Path, Bytes) -> Result<Number, FileError>
module Utf8 {
  readFile: Path -> Result<String, FileError>
  writeFile: (Path, String) -> Result<Number, FileError>
  appendFile: (Path, String) -> Result<Number, FileError>
}

There are some natural extensions to this, like reading/writing a range, but those can come later. This gives us a really good starting point that we can continue to build from.

This will need a long overdue update to sys/file to operate on bytes, but I'll get that change in swiftly.

ospencer avatar Feb 04 '23 22:02 ospencer

Here is said change: https://github.com/grain-lang/grain/pull/1655

ospencer avatar Feb 04 '23 22:02 ospencer

Looks good to me @ospencer, thanks for the feedback

av8ta avatar Feb 05 '23 00:02 av8ta

@av8ta Since you made a lot of progress towards this API on your initial PR, were you still planning on writing a new iteration based on the recent feedback or would you be ok with someone else bringing it over the finish line?

alex-snezhko avatar Mar 29 '23 20:03 alex-snezhko

@alex-snezhko I'm keen to resume with the new design. I found myself afk for a few weeks while shifting country and getting settled in, and can get back into it now.

av8ta avatar Apr 10 '23 08:04 av8ta

@av8ta Sounds good! I made some progress for the new proposed API that you could use as a base if you'd like https://github.com/alex-snezhko/grain/blob/ergo-fs/stdlib/sys/fs.gr

alex-snezhko avatar Apr 10 '23 18:04 alex-snezhko

Thanks @alex-snezhko that's really helpful

av8ta avatar Apr 11 '23 08:04 av8ta

Hey @av8ta and @alex-snezhko - I'd love for this PR to be up soon so we can get as much landed for 0.6 as possible. We are probably only a couple of weeks away! 🙏

phated avatar Apr 24 '23 03:04 phated

Hey @phated I've had busy weekends lately, but I've got a chunk of this one free so I'll take a look at it. Hopefully that won't be too late for 0.6 :pray:

av8ta avatar May 11 '23 07:05 av8ta

@av8ta I've had some time to pick this back up as I wasn't sure if you had the time to get to this since your last comment. However, I see that you pushed up a commit on Oct 3rd, so I was curious if you were still looking to close this out or if you'd be fine with me making this PR? I implemented several additional functions and think I have it at a near-complete state now.

alex-snezhko avatar Dec 04 '23 01:12 alex-snezhko

@alex-snezhko yeah go for it, I had planned to but having trouble creating the time for it.

av8ta avatar Dec 19 '23 10:12 av8ta