mojo
mojo copied to clipboard
[Feature Request] Add path.getsize or file.getsize methods
Review Mojo's priorities
- [X] I have read the roadmap and priorities and I believe this request falls within the priorities.
What is your request?
Mojo v0.4.0 now supports file
module, but it seems that we still need one essential function -- getsize
or filesize
With native Mojo file
module we have to load whole file into memory just to know its size.
var f_ = open(file_name, "r")
let data = f_.read()
let cp_size = data._buffer.size
What is your motivation for this change?
So far in llama2.π₯
we have to use workaround via Python.
let _os = Python.import_module("os")
let ff_size = _os.path.getsize(file_name)
let cp_size = string.atol(ff_size.to_string())
It's turned out newcomers to Mojo who are trying to run llama2 locally, got multiple issues because of misconfiguration with MOJO_PYTHON_LIBRARY
, etc.
It would be great to add getsize
or filesize
to file
or path
modules in the next release.
Any other details?
In Python it's implemented in path
module
let _os = Python.import_module("os")
let ff_size = _os.path.getsize(file_name)
I think it's worthwhile to implement the more general os.stat
, since fs_stat
must provide more details about file, but I'm not sure what are the concerns regarding platforms compatiblity.
# python alternatives
os.path.getsize()
os.stat()
I like this idea, but how about divide words via "_"? In my opinion get_size looks much better than getsize
Proper naming is really tricky for the language that supposed to be a "drop-in replacement for Python" .
Let's see how it feels in other languages :
Cpp
int main(int argc, char *argv[]) {
std::filesystem::path p{argv[1]};
std::cout << "The size of " << p.u8string() << " is " <<
std::filesystem::file_size(p) << " bytes.\n";
}
Zig
var file = try std.fs.cwd().openFile("file.txt", .{ open = true }));
const file_size = (try file.stat()).size;
Julia
filesize(path...)
Equivalent to stat(file).size.
Python
os.path.getsize(file_name)
Swift
let fileUrl: URL
print("file size = \(fileUrl.fileSize), \(fileUrl.fileSizeString)")
I think the Python version is most expected variation (for Python switchers at least)
Thanks @tairov for requesting this (in the the live stream of all places :) ) . We'll make sure this is prioritized.
Keep up the great work
If you use tools which will give you code completion, you will obtain both get_size and getsize, so I don't think that is the case here. Readability counts and it is greater with seperated words than mixed up (see simdbitwidth, simdwithof and so on
Also, most functions in Mojo have divided words, so every case which breaks from that looks strange
Also, most functions in Mojo have divided words, so every case which breaks from that looks strange
I hope till stable release the Mojo team will stabilize API & naming conventions.. In other case it could end-up "PHP" scenario. The names of the functions laid down in the first releases have already turned out to be extremely difficult to isolate from the language, and as a result, the entire standard library of functions has become quite heterogeneous. This problem was solved only with the introduction of high-level frameworks (like Laravel & Symfony) that abstracted methods from the standard PHP library and made framework API more clean.
I'm honestly really not sure about proper approach here.. I would suggest to make the API more closer to Python, since it might help with Python -> Mojo switch & learning curve.
As you mentioned, if we use code-completion tools it doesn't make much difference I believe. Especially taking into account the long-term goals of creating a coherent language and further adaptation to it by the community.. Probably by that time you wonβt have to read the code, everything will be explained to you by the LLM/AI agent π
Thanks @tairov for requesting this (in the the live stream of all places :) ) . We'll make sure this is prioritized.
Keep up the great work
Thanks @abduld for your feedback.. BTW, I decided to vectorize "feature requests", so a small cart there, a couple more requests #1134 & #1135 π
PS. It was a great honor to share the livestream stage with your team!
Yeah, stable and standardized naming convention would be a great benefit over Python which Mojo tends to be a better version (Python++). This also mean, that Mojo could fix these Python problems related to naming. Mojo should take the best from Python and fix what may be improved.
I will vote for being consistent with own rules and not to break them when trying to replicate Python. As far as I know this strict replication is not possible, so sticking with exactly same names as Python has, is not the best idea IMHO.
Please to be consistent first of all, even if this will require to take other name than Python has.
Edit: one of the big problem related to keeping the same names as Python has is the need of synchronisation with its changes. Let's suppose, that Mojo will take all the names which Python has in some library. If Python will decide to deprecate some function and propose another one instead, Mojo, following changes, should also deprecate this method in its stdlib. Keeping everything in sync would led to maintenance hell. Edit2: the second reason is the need to have the same signature which may be not optimal (Mojo would like for example to have parametrization). The same problem with keeping that in sync applies