mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[Feature Request] Add mmap module

Open tairov opened this issue 1 year ago • 9 comments

Review Mojo's priorities

What is your request?

Implement mmap module natively in Mojo link: https://man7.org/linux/man-pages/man2/mmap.2.html

What is your motivation for this change?

When implementing multiple solutions on Mojo, we reached a stage where huge files needed to be loaded into memory (for example, open source llm models).

For these purposes, it would be really great to have the mmap module implemented natively in Mojo.

Any other details?

No response

tairov avatar Oct 21 '23 21:10 tairov

@abduld fyi

Relevant blog post: https://justine.lol/mmap/

jackos avatar Nov 14 '23 23:11 jackos

@abduld fyi

Relevant blog post: https://justine.lol/mmap/

Epic story how they get it working in C++, I didn't know such a simple and obvious thing might be relevantly complicated to implement within cpp/stl. For example on C , the mmap is trivial -- https://github.com/karpathy/llama2.c/pull/50/files On Mojo it should be straightforward I hope, since we have full control on pointers. Currently I'm doing "pseude-mmap", when weights are loaded into memory one time (with file IO), and then I'm carefully set up pointers.

tairov avatar Nov 14 '23 23:11 tairov

Calling mmap through a function pointer (can't use external_call yet, AFAICT) seems to crash: https://gist.github.com/ihnorton/a49c8165f86489def9031583f3ed1b20

(macOS arm64, mojo 0.5)

ihnorton avatar Nov 15 '23 03:11 ihnorton

Thanks for the links and discussion here, everyone. I should have a chance to look into this at the end of this week.

JoeLoser avatar Nov 15 '23 16:11 JoeLoser

Calling mmap through a function pointer (can't use external_call yet, AFAICT) seems to crash: https://gist.github.com/ihnorton/a49c8165f86489def9031583f3ed1b20

What a pity.. I was betting high on this

tairov avatar Nov 15 '23 17:11 tairov

I was able to get mmap to work based on @ihnorton excellent post above. In fact I dont fully understand why @ihnorton is seeing a crash and I am not. [I am running mojo 0.5.0 (6e50a738) on ubuntu amd64]

from sys import ffi
from memory import unsafe

struct MapOpt:
    alias MAP_SHARED = 0x01
    alias MAP_PRIVATE = 0x02

struct Prot:
    alias PROT_NONE = 0x0
    alias PROT_READ = 0x1
    alias PROT_WRITE = 0x2
    alias PROT_EXEC = 0x4

alias c_void = UInt8


alias mmap_type = fn(addr: Pointer[c_void],
    len: Int64,
    prot: Int32,
    flags: Int32,
    fildes: Int32,
    offset: Int64) -> Pointer[c_void]


def main():
    let handle = ffi.DLHandle("")
    let c_mmap = handle.get_function[mmap_type]("mmap")
    let fnm = StringRef("data")
    let fd = external_call["open", Int, Pointer[Int8], Int](fnm.data._as_scalar_pointer(), 0x0)
    if (fd == -1):
        raise "Failed to open file"
    let NULL = unsafe.bitcast[c_void](0x0)
    let p = c_mmap(
        NULL, 16, Prot.PROT_READ, MapOpt.MAP_SHARED, fd, 0
    )
    for i in range(26):
        print(p[i])
  
  

deepankarsharma avatar Nov 23 '23 15:11 deepankarsharma

Here's a version that works on mac; in order to get the function pointer, it is necessary to use DLHandle on a dylib which is linked into the process.

I don't see a way to write the equivalent of ctypes.CDLL(None) right now on macOS, whereas DLHandle("") does that on Linux (matching dlopen semantics).

from sys import ffi
from memory import unsafe

struct MapOpt:
    alias MAP_SHARED = 0x01
    alias MAP_PRIVATE = 0x02

struct Prot:
    alias PROT_NONE = 0x0
    alias PROT_READ = 0x1
    alias PROT_WRITE = 0x2
    alias PROT_EXEC = 0x4

alias c_void = UInt8


alias mmap_type = fn(addr: Pointer[c_void],
    len: Int64,
    prot: Int32,
    flags: Int32,
    fildes: Int32,
    offset: Int64) -> Pointer[c_void]


def main():
    let handle: ffi.DLHandle
    if ffi.os_is_linux():
        handle = ffi.DLHandle("")
    #elif ffi.os_is_windows():
    # bug: if this section is un-commented, then `h`
    #      is considered uninitialized below
    #    raise "Not yet supported on Windows"
    else:
        # we just need _a_ dylib in the image
        handle = ffi.DLHandle("libate.dylib")

    let c_mmap = handle.get_function[mmap_type]("mmap")

    let fnm = StringRef("data")
    let fd = external_call["open", Int, Pointer[Int8], Int](fnm.data._as_scalar_pointer(), 0x0)
    if (fd == -1):
        raise "Failed to open file"
    let NULL = unsafe.bitcast[c_void](0x0)
    let p = c_mmap(
        NULL, 16, Prot.PROT_READ, MapOpt.MAP_SHARED, fd, 0
    )
    for i in range(26):
        let v = p[i].cast[DType.int64]()
        #print(chr(Int64(v)))
        print_no_newline(chr(v.to_int()))

ihnorton avatar Nov 25 '23 04:11 ihnorton

I'd welcome a new mmap module. We should take a look at https://docs.rs/memmap/latest/memmap/ and come up with a proposal for getting started on that. Is anyone interested in driving this?

JoeLoser avatar Jun 05 '24 23:06 JoeLoser

Hey @JoeLoser - I got a working implementation up for mmap in Mojo.

I wrapped up some of my thoughts, and provided a working example in the proposal here: #3218

KCaverly avatar Jul 11 '24 13:07 KCaverly