mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[Feature Request] Add path.getsize or file.getsize methods

Open tairov opened this issue 1 year ago β€’ 8 comments

Review Mojo's priorities

What is your request?

Mojo v0.4.0 now supports file module, but it seems that we still need one essential function -- getsize or filesize With native Mojo file module we have to load whole file into memory just to know its size.

    var f_ = open(file_name, "r")
    let data = f_.read()
    let cp_size = data._buffer.size

What is your motivation for this change?

So far in llama2.πŸ”₯ we have to use workaround via Python.

    let _os = Python.import_module("os")
    let ff_size = _os.path.getsize(file_name)
    let cp_size = string.atol(ff_size.to_string())

It's turned out newcomers to Mojo who are trying to run llama2 locally, got multiple issues because of misconfiguration with MOJO_PYTHON_LIBRARY, etc.

It would be great to add getsize or filesize to file or path modules in the next release.

Any other details?

In Python it's implemented in path module

    let _os = Python.import_module("os")
    let ff_size = _os.path.getsize(file_name)

I think it's worthwhile to implement the more general os.stat , since fs_stat must provide more details about file, but I'm not sure what are the concerns regarding platforms compatiblity.

# python alternatives
os.path.getsize()
os.stat()

tairov avatar Oct 21 '23 11:10 tairov

I like this idea, but how about divide words via "_"? In my opinion get_size looks much better than getsize

gryznar avatar Oct 21 '23 11:10 gryznar

Proper naming is really tricky for the language that supposed to be a "drop-in replacement for Python" .

Let's see how it feels in other languages :

Cpp

int main(int argc, char *argv[]) {
  std::filesystem::path p{argv[1]};
  std::cout << "The size of " << p.u8string() << " is " <<
      std::filesystem::file_size(p) << " bytes.\n";
}

Zig

var file = try std.fs.cwd().openFile("file.txt", .{ open = true }));
const file_size = (try file.stat()).size;

Julia

  filesize(path...)
  Equivalent to stat(file).size.

Python

os.path.getsize(file_name)

Swift

let fileUrl: URL
print("file size = \(fileUrl.fileSize), \(fileUrl.fileSizeString)")

I think the Python version is most expected variation (for Python switchers at least)

tairov avatar Oct 21 '23 20:10 tairov

Thanks @tairov for requesting this (in the the live stream of all places :) ) . We'll make sure this is prioritized.

Keep up the great work

abduld avatar Oct 21 '23 21:10 abduld

If you use tools which will give you code completion, you will obtain both get_size and getsize, so I don't think that is the case here. Readability counts and it is greater with seperated words than mixed up (see simdbitwidth, simdwithof and so on

gryznar avatar Oct 21 '23 21:10 gryznar

Also, most functions in Mojo have divided words, so every case which breaks from that looks strange

gryznar avatar Oct 21 '23 21:10 gryznar

Also, most functions in Mojo have divided words, so every case which breaks from that looks strange

I hope till stable release the Mojo team will stabilize API & naming conventions.. In other case it could end-up "PHP" scenario. The names of the functions laid down in the first releases have already turned out to be extremely difficult to isolate from the language, and as a result, the entire standard library of functions has become quite heterogeneous. This problem was solved only with the introduction of high-level frameworks (like Laravel & Symfony) that abstracted methods from the standard PHP library and made framework API more clean.

I'm honestly really not sure about proper approach here.. I would suggest to make the API more closer to Python, since it might help with Python -> Mojo switch & learning curve.

As you mentioned, if we use code-completion tools it doesn't make much difference I believe. Especially taking into account the long-term goals of creating a coherent language and further adaptation to it by the community.. Probably by that time you won’t have to read the code, everything will be explained to you by the LLM/AI agent πŸ˜€

tairov avatar Oct 21 '23 21:10 tairov

Thanks @tairov for requesting this (in the the live stream of all places :) ) . We'll make sure this is prioritized.

Keep up the great work

Thanks @abduld for your feedback.. BTW, I decided to vectorize "feature requests", so a small cart there, a couple more requests #1134 & #1135 πŸ˜…

PS. It was a great honor to share the livestream stage with your team!

tairov avatar Oct 21 '23 21:10 tairov

Yeah, stable and standardized naming convention would be a great benefit over Python which Mojo tends to be a better version (Python++). This also mean, that Mojo could fix these Python problems related to naming. Mojo should take the best from Python and fix what may be improved.

I will vote for being consistent with own rules and not to break them when trying to replicate Python. As far as I know this strict replication is not possible, so sticking with exactly same names as Python has, is not the best idea IMHO.

Please to be consistent first of all, even if this will require to take other name than Python has.

Edit: one of the big problem related to keeping the same names as Python has is the need of synchronisation with its changes. Let's suppose, that Mojo will take all the names which Python has in some library. If Python will decide to deprecate some function and propose another one instead, Mojo, following changes, should also deprecate this method in its stdlib. Keeping everything in sync would led to maintenance hell. Edit2: the second reason is the need to have the same signature which may be not optimal (Mojo would like for example to have parametrization). The same problem with keeping that in sync applies

gryznar avatar Oct 21 '23 21:10 gryznar