mojo
mojo copied to clipboard
[Feature Request] Add mmap module
Review Mojo's priorities
- [X] I have read the roadmap and priorities and I believe this request falls within the priorities.
What is your request?
Implement mmap
module natively in Mojo
link: https://man7.org/linux/man-pages/man2/mmap.2.html
What is your motivation for this change?
When implementing multiple solutions on Mojo, we reached a stage where huge files needed to be loaded into memory (for example, open source llm models).
For these purposes, it would be really great to have the mmap module implemented natively in Mojo.
Any other details?
No response
@abduld fyi
Relevant blog post: https://justine.lol/mmap/
@abduld fyi
Relevant blog post: https://justine.lol/mmap/
Epic story how they get it working in C++, I didn't know such a simple and obvious thing might be relevantly complicated to implement within cpp/stl. For example on C , the mmap is trivial -- https://github.com/karpathy/llama2.c/pull/50/files On Mojo it should be straightforward I hope, since we have full control on pointers. Currently I'm doing "pseude-mmap", when weights are loaded into memory one time (with file IO), and then I'm carefully set up pointers.
Calling mmap through a function pointer (can't use external_call
yet, AFAICT) seems to crash: https://gist.github.com/ihnorton/a49c8165f86489def9031583f3ed1b20
(macOS arm64, mojo 0.5)
Thanks for the links and discussion here, everyone. I should have a chance to look into this at the end of this week.
Calling mmap through a function pointer (can't use
external_call
yet, AFAICT) seems to crash: https://gist.github.com/ihnorton/a49c8165f86489def9031583f3ed1b20
What a pity.. I was betting high on this
I was able to get mmap to work based on @ihnorton excellent post above. In fact I dont fully understand why @ihnorton is seeing a crash and I am not. [I am running mojo 0.5.0 (6e50a738) on ubuntu amd64]
from sys import ffi
from memory import unsafe
struct MapOpt:
alias MAP_SHARED = 0x01
alias MAP_PRIVATE = 0x02
struct Prot:
alias PROT_NONE = 0x0
alias PROT_READ = 0x1
alias PROT_WRITE = 0x2
alias PROT_EXEC = 0x4
alias c_void = UInt8
alias mmap_type = fn(addr: Pointer[c_void],
len: Int64,
prot: Int32,
flags: Int32,
fildes: Int32,
offset: Int64) -> Pointer[c_void]
def main():
let handle = ffi.DLHandle("")
let c_mmap = handle.get_function[mmap_type]("mmap")
let fnm = StringRef("data")
let fd = external_call["open", Int, Pointer[Int8], Int](fnm.data._as_scalar_pointer(), 0x0)
if (fd == -1):
raise "Failed to open file"
let NULL = unsafe.bitcast[c_void](0x0)
let p = c_mmap(
NULL, 16, Prot.PROT_READ, MapOpt.MAP_SHARED, fd, 0
)
for i in range(26):
print(p[i])
Here's a version that works on mac; in order to get the function pointer, it is necessary to use DLHandle on a dylib which is linked into the process.
I don't see a way to write the equivalent of ctypes.CDLL(None)
right now on macOS, whereas DLHandle("")
does that on Linux (matching dlopen
semantics).
from sys import ffi
from memory import unsafe
struct MapOpt:
alias MAP_SHARED = 0x01
alias MAP_PRIVATE = 0x02
struct Prot:
alias PROT_NONE = 0x0
alias PROT_READ = 0x1
alias PROT_WRITE = 0x2
alias PROT_EXEC = 0x4
alias c_void = UInt8
alias mmap_type = fn(addr: Pointer[c_void],
len: Int64,
prot: Int32,
flags: Int32,
fildes: Int32,
offset: Int64) -> Pointer[c_void]
def main():
let handle: ffi.DLHandle
if ffi.os_is_linux():
handle = ffi.DLHandle("")
#elif ffi.os_is_windows():
# bug: if this section is un-commented, then `h`
# is considered uninitialized below
# raise "Not yet supported on Windows"
else:
# we just need _a_ dylib in the image
handle = ffi.DLHandle("libate.dylib")
let c_mmap = handle.get_function[mmap_type]("mmap")
let fnm = StringRef("data")
let fd = external_call["open", Int, Pointer[Int8], Int](fnm.data._as_scalar_pointer(), 0x0)
if (fd == -1):
raise "Failed to open file"
let NULL = unsafe.bitcast[c_void](0x0)
let p = c_mmap(
NULL, 16, Prot.PROT_READ, MapOpt.MAP_SHARED, fd, 0
)
for i in range(26):
let v = p[i].cast[DType.int64]()
#print(chr(Int64(v)))
print_no_newline(chr(v.to_int()))
I'd welcome a new mmap
module. We should take a look at https://docs.rs/memmap/latest/memmap/ and come up with a proposal for getting started on that. Is anyone interested in driving this?
Hey @JoeLoser - I got a working implementation up for mmap
in Mojo.
I wrapped up some of my thoughts, and provided a working example in the proposal here: #3218