libriscv icon indicating copy to clipboard operation
libriscv copied to clipboard

Python module?

Open ethindp opened this issue 1 year ago • 16 comments

It'd be really cool if this had a Python module. I'd develop one but I'm not certain how to do the template argument expansions and such. But pybind11 would make this easier and it'd be really neat. I might start on this to see how far I get since I'd love to use this library in a Python-based game but it'd be cool if we had an official module if that's on the table.

ethindp avatar Nov 25 '24 22:11 ethindp

Is this about Godot Sandbox or perhaps using libriscv directly as a scripting solution in a custom engine?

fwsGonzo avatar Nov 26 '24 06:11 fwsGonzo

@fwsGonzo It's a custom engine (Evennia). Like I said, I'd wrap it myself if I could figure out how to do so getting around all the template stuff. (I guess I could just instantiate all the possible template arguments but... Ugh?)

ethindp avatar Nov 26 '24 07:11 ethindp

The templates used in libriscv is used for choosing the architecture, like 32- or 64-bit. 64-bit is usually the most interesting.

Is the game engine written in Python, and you need bindings for libriscv?

fwsGonzo avatar Nov 26 '24 07:11 fwsGonzo

Yup, so I can just bind riscv64, doubt I'll need riscv32 (though if we make an official module it might be worth binding that as well.)

ethindp avatar Nov 26 '24 16:11 ethindp

One question I have involves syscalls though. If I wanted to make a Python module, I would need to figure out how to allow Python to get arguments of syscalls. When unpacking arguments via the syscall function, is it okay to do something like auto [x0, x1, x2, x3, ...] = machine; (I forget the precise ssyntax, sorry) or must I specifically tailor the unpacking to the specific number of arguments/types? I'm probably going to go a bit "out on a limb" so to speak, so I may not actually set up the emulator to run Linux applications in the usual manner.

ethindp avatar Nov 26 '24 19:11 ethindp

Most (if not all) system calls have known arguments, but we do know a few things:

  1. All arguments are integral - so no floats, just integers and pointers in the guest
  2. It's maximum 7 arguments followed by the last integer register which contains the system call number (but once you're in the system call function you already know the number)
  3. You usually, if not always, return an simple integral value in A0

So, if you want to cover a large area, you can do:

auto [a0, a1, a2, a3, a4, a5, a6] = machine.sysargs<gaddr_t, gaddr_t, gaddr_t, gaddr_t, gaddr_t, gaddr_t, gaddr_t> ();

However, since they're all lined up internally anyway, why not get the whole thing:

auto& array = machine.cpu.registers().get();
auto a0 = array.at(10); // First argument is the 10th register, last is 16th
...
auto a6 = array.at(16); 

Usually you install one system call handler at a time, which handles a specific system call that you know the arguments for. Then using .sysargs<...> is very handy because it lets you get or view strings, string_views, arrays and such very easily.

But yeah, if you want to, you can just get the registers directly. The arguments to a system call would start from register 10, also known as A0. If you have total control over the guest program you can also invent your own ABI and use whatever registers you like. But, usually we embed C, C++, Rust and such, and they all follow the same ABI. Even function calls use the same ABI, which simplifies a lot of interactions.

fwsGonzo avatar Nov 26 '24 20:11 fwsGonzo

@fwsGonzo That makes sense, just not sure precisely how I would wrap that from the Python side is all.

ethindp avatar Nov 26 '24 21:11 ethindp

That's a good question. I don't have a clue either, knowing nothing about pybind11. But, if the goal is to be able to handle system calls, then it really shouldn't be too hard.

struct SyscallHelper {
    using gaddr_t = riscv::address_type<riscv::RISCV64>;
    machine_t& machine;
    const int syscall_no;

    int get_intarg(int index) { return machine.cpu.reg(10 + index); }

    std::string get_stringarg(int index) {
       gaddr_t address = machine.cpu.reg(10 + index);
       return machine.memory.memstring(address);
    }

    template <typename T>
    std::vector<Type> get_vectorarg(int ptr_index, int size_index) {
      gaddr_t address = machine.cpu.reg(10 + ptr_index);
      unsigned elements = machine.cpu.reg(10 + size_index);
      Type* array = machine.memory.memarray<Type> (address, elements);
      return std::vector<Type> (array, array + elements);
    }
};

Does a helper struct work? A system call is just a function call where the arguments are in registers, and depending on what the system call is supposed to be doing, you have to access the registers in different ways.

Are you expecting something dynamic? Like a system call that works like a JavaScript function in that it can be anything? That's a completely different way of working, and requires something like a God Object, like Godots Variant type.

fwsGonzo avatar Nov 27 '24 09:11 fwsGonzo

I'm not sure, it's worth a try. Though I am a bit confused about the array constructor, how would I use that? Like can you provide a real-world example? I feel like if I use it naively I'll get some out-of-bounds reads or something

ethindp avatar Nov 27 '24 18:11 ethindp

All the memory helper functions are safe to use, they will never access guest memory in invalid ways. memarray returns a pointer of the desired type with the given number of elements following it, or it will throw an exception. It can't return an invalid value. The entire API is designed this way.

It would help a lot more if I could understand what you're trying to do, though. Just showing how to handle a system call makes little sense when I don't know what the guest program is doing/looks like or what pybind11 looks like. There are examples around showing system call handlers. Here's a bunch of them in rvscript: https://github.com/libriscv/rvscript/blob/master/engine/src/script/script_syscalls.cpp

What they all have in common is that they handle one specific system call, and so we know the arguments already, the return values and what's supposed to happen.

fwsGonzo avatar Nov 27 '24 18:11 fwsGonzo

Sorry for being vague, completely unintended.

The binaries will be ones players of the game write; I'm hoping to make it possible to give players a feeling of control/power over the evolution of the game and certain things within it, like computers and such. So I may end up mimicking Linux RISC-V syscalls but rewriting them to be "virtual" in the sense that they don't actually do anything on the real file system, only in the game. I'm hoping to do the syscalls in Python because it would make it easier to extend them without needing to rebuild the C++ side of things very often, except for updates. The Python API will, if I do it right, mirror that of the C++ API as closely as possible except that it'll be in Python so you'll have access to all of Python's features and stuff. My idea is that players can write, i.e., C/C++ code, and then the game passes that code onto clang/clang++ for example, or zig cc/zig c++, and then you get an in-game object representing the compiled program that you can use on a computer in the game and it'll run it in the RISC-V virtual machine. (I considered x86-64 as a possible target as well, but I could at best only find CPU emulators and not ones that do what libriscv does where the emulation is that of memory and the processor and is extremely fast, and doesn't require virtualization extensions to be present.) So I chose libriscv because (1) I really like it and think it could work in all kinds of contexts, (2) it fits my use case because I can almost, if not entirely, isolate the machine from the underlying operating system and hardware environment, and (3) people can write code using familiar languages.

Pybind11 is a C++ binding over the Python API; you can pretty much export raw C++ classes and Pybind11 will figure out all the complexities for you. There's boost.python but I couldn't really figure out that API and pybind11 looked more ergonomic to me. I can provide some examples of binding using it if you like, though the website provides lots of them too. I hope this helps.

ethindp avatar Nov 27 '24 22:11 ethindp

With libriscv you have to have a RISC-V toolchain, unless you are comfortable writing freestanding C/C++ and building your own eco-system. You could get by with just clang, for example. Zig is everything-in-one afaik, as it produces RISC-V without dependencies.

Anyway, it's probably easier to help you if you join our Discord.

fwsGonzo avatar Nov 28 '24 11:11 fwsGonzo

I just realized that you are just trying to create bindings in general for libriscv, to be able to use it from Python?

If so, I think that should be quite doable.

fwsGonzo avatar Dec 01 '24 18:12 fwsGonzo

Yep, the compiler is something I could figure out on my own I think. That one isn't super hard to solve.

ethindp avatar Dec 01 '24 19:12 ethindp

Maybe how I have implemented the C API can be an inspiration?

See: https://github.com/libriscv/libriscv/blob/master/c/libriscv.cpp

There is a user-field in Machine that you can use to wrap Machine in your own structure.

Handling system calls is just about providing enough features in a wrapper to be able to do all the things you might want to do. So, there I think my system call struct example above is a good start. Just need to add more stuff, and of course, not expose raw arrays.

If you want to handle all system calls in Python you could set every single one (0 to RISCV_SYSCALLS_MAX-1) to a single callback that then calls a function in python with the system call number as argument (register A7).

fwsGonzo avatar Dec 01 '24 19:12 fwsGonzo

It may be possible to call the C API from Python using ctypes

imbev avatar Mar 16 '25 01:03 imbev