rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Consider adding Linux targets that don't depend on libc

Open japaric opened this issue 7 years ago • 61 comments

e.g. x86_64-linux (or x86_64-unknown-linux-rust).

These would be the spiritual successor of steed: a standard library, free of C dependencies, for Linux systems. Steed implemented (or planned to) the Rust standard library using raw Linux system calls instead of libc's API. These libc-less targets would also implement std using raw system calls.

Even though steed has been inactive for over a year people continue to star it on GitHub and currently has over 500 stars so it seems there's still interest for something like it.

What we learned during development is that maintaining a standard library out of tree is a lot of work because things get quickly out sync so if there's still interest for something like steed I would recommend writing an RFC to add support for libc-less targets (e.g. x86_64-linux) to rust-lang/rust; this would be equivalent to importing and developing steed as part of rust-lang/rust.

An RFC is also a good way to (re-)evaluate the ROI of making a libc-less standard library. One of the goals of steed was hassle-free cross compilation but today that's solved by the Linux/MUSL targets + rust-lld (works on stable). The other goal was better optimizations (plain LTO doesn't optimize across FFI) but cross-language LTO is a now thing (-Z cross-lang-lto). There may be less costly ways to achieve the same results.

The rest of this post describes the status of steed as of 2017-10-20 (date of last commit).

What we had working:

  1. standard I/O (std::io::{stdin,stderr,stdout})
  2. filesystem operations (std::fs)
  3. collections (std::collections) (but see (a) below)
  4. std::sync::Mutex (courtesy of parking_lot)
  5. std::env
  6. UDP and TCP sockets (std::net) (but see (d) below)
  7. minimal support for threads and TLS
  8. #[test] support (but see (c) below)

You can check the examples directory to get an idea of what you could write with it.

What was missing:

a. a proper allocator. steed used a bump pointer allocator that never freed memory b. all the math stuff (e.g. f32::sin). These days one can use libm. c. unwinding, all the steed targets used -C panic=abort d. hostname lookup e. errno. It was unclear whether we should implement it or not. It seems to only be required by std::io::Error::last_os_error. Linux system calls have a Result-like API so errno is not required to propagate errors from syscalls to the std API.

and much more stuff; you can check the issue tracker for the full list.


cc @tbu- @briansmith

japaric avatar Dec 09 '18 14:12 japaric

This sounds really cool! However, I think you may be underestimating how much work this is. There would also need to be collaboration with kernel developers about system calls and stuff.

mark-i-m avatar Dec 09 '18 20:12 mark-i-m

@mark-i-m I agree that it is quite a bit of work, but there is no need for collaborating with kernel devs. The syscall interface is considered stable and they treat any change that'll break userspace as a bug. Go has successfully implemented this strategy on all OSs they support. So I'd argue it is pretty doable and in my opinion desirable.

lorenz avatar Jan 01 '19 17:01 lorenz

This sounds really cool! However, I think you may be underestimating how much work this is. There would also need to be collaboration with kernel developers about system calls and stuff.

Most of the stuff was already in place, as you can see in @japaric's comment. :)

tbu- avatar Jan 01 '19 20:01 tbu-

@lorenz Yes, you are right, but what I mean is that if we want to influence the syscall interfaces to make them more Rust-friendly we would need to interact more with the kernel devs...

mark-i-m avatar Jan 02 '19 03:01 mark-i-m

@mark-i-m What do you have in mind that you want changed? The syscall interface is totally usable from Rust and is in various ways a better fit than libc itself.

lorenz avatar Jan 04 '19 18:01 lorenz

Admittedly, I haven't yet gotten to thinking extensively about this, but mainly my thinking is that syscalls often don't even attempt to by safe, so adding good Rust wrappers around them is either painful or inefficient.

For example, it is hard to write a good safe wrapper for mmap because it can do so many things; encoding all of them into efficient zero-cost abstractions is hard. There are also a bunch of syscalls that depend on C-defined structs/types (and their layouts), e.g. clock_get*, sched_setaffinity. (EDIT: actually, I'm not sure if these structs/types are just parts of libc... so I might be wrong there...)

mark-i-m avatar Jan 04 '19 18:01 mark-i-m

But this is not about implementing all possible safe uses of (for example) mmap, but just about reimplementing the standard library on top of syscalls which limits the scope to these specific operations. Also structs with C layout are no problem in Rust (we have #[repr(C)]) and are a sensible choice for a kernel since they are interoperable with everything. The kernel people will never maintain a separate syscall interface just for a specific programming language.

lorenz avatar Jan 04 '19 20:01 lorenz

The kernel people will never maintain a separate syscall interface just for a specific programming language.

No, but they do talk with e.g. libc maintainers to discuss interfaces, libc support, etc.

But this is not about implementing all possible safe uses of (for example) mmap, but just about reimplementing the standard library on top of syscalls which limits the scope to these specific operations.

That's true, but this is also an opportunity to expose the kernel ABI in a safer way, although that doesn't seem to be the intent of this issue...

mark-i-m avatar Jan 04 '19 21:01 mark-i-m

Any newly-proposed syscall will at best be present in the next kernel version, but it’ll be a number of years before you can usefully ship a program that just assumes it is present.

SimonSapin avatar Jan 05 '19 00:01 SimonSapin

Any newly-proposed syscall will at best be present in the next kernel version, but it’ll be a number of years before you can usefully ship a program that just assumes it is present.

Yes, but Rust can take advantage of them right away by falling back to something else when it receives ENOSYS (or whatever), without having to rely on convincing glibc/musl maintainers to do so.

briansmith avatar Jan 05 '19 00:01 briansmith

Sure but if a fallback is needed anyway there is no point in doing all this only "to make them more Rust-friendly", is there? (As opposed to, say, a performance optimization.)

SimonSapin avatar Jan 05 '19 09:01 SimonSapin

@mark-i-m It sounds like you're interested in something different than this is proposing. @japaric is proposing a standard library that's equivalent to the current std but without linking to the platform libc, by calling existing syscalls directly. That's a useful goal that many people want, and it doesn't involve changes to syscalls in any way.

If someone wanted to add new syscalls to Linux, for whatever reason, that would be outside the scope of this proposal.

joshtriplett avatar Feb 15 '19 18:02 joshtriplett

As a separate thought, for a target like this, we could theoretically have a libc-compatible C library implemented in Rust, and then link C programs to that library. We don't need that, but it would make for an interesting future addition to this target.

joshtriplett avatar Mar 15 '19 20:03 joshtriplett

libc-compatible C library implemented in Rust

https://gitlab.redox-os.org/redox-os/relibc#relibc https://github.com/redox-os/relibc

"relibc is a portable POSIX C standard library written in Rust."

It is under heavy development, and currently supports Redox and Linux.

The motivation for this project is twofold: Reduce issues the redox crew was having with newlib, and create a safer alternative to a C standard library written in C. It is mainly designed to be used under redox, as an alternative to newlib, but it also supports linux syscalls via the sc crate.

Supported OSes Redox OS Linux

Supported architectures x86_64 Aarch64

Edit: Just found this: This tries to add an x86_64-unknown-linux-relibc target: https://github.com/mati865/rust/issues/1#issuecomment-487393630 https://github.com/mati865/rust/compare/master...relibc https://github.com/rust-lang/libc/compare/master...mati865:relibc

Darkspirit avatar Mar 15 '19 21:03 Darkspirit

Looking forward to playing with this libc-free target! I wonder if it will be possible to later remove x86_64-unknown-linux reliance on libc after experimenting on a separate target?

newpavlov avatar May 18 '19 12:05 newpavlov

a proper allocator. steed used a bump pointer allocator that never freed memory

IIUC elfmalloc can be used here.

newpavlov avatar May 30 '19 15:05 newpavlov

There is a news : A libc in LLVM

zmlzm avatar Jun 26 '19 01:06 zmlzm

Does adding new targets require a rfc? Or were large changes required to the codebase?

dvc94ch avatar Aug 23 '19 10:08 dvc94ch

Hi, I've started working on exactly this but with a bit of a different mindset.

My goal is to push rust safety's guarantees as much as possible. currently it stops at the entrance to libc. so I'm writing a rust libc implementation.

I already seen parts of libc that have UB potential if you misuse them. something that rust can easily solve.

But. I want this to be a usable target, so I propose to make a new target that will still link to libc, but will slowly move away from using libc, currently I see 3 areas where libc is actually needed:

  1. There are users who call libc and then use std::io::Error::last_os_error() for the errno. so we need to keep this working(potentially until a new edition / rust 2.X where we could deprecate this?)
  2. libunwind.
  3. pthreads.

For the first one, I personally think it's fine, because then LTO should remove libc for users who don't use that part of libstd.

For the second one I think the right path forward is a rewrite in rust. but I understand that this can take a while.

For the third one, this should be a public discussion. do we want to just copy the pthreads implementation? or do we want to invent our own threads that makes more sense for rust? if so how will that affect FFI to C?

Until these questions are answered I still think it's a big win to have a target that for the short run tries to minimize the calls to libc but for the long run plans to remove it completely.

Would love people's feedback on this. Already made a list of all the calls to libc in rust's libstd, and a start of how I think the API can/should look like https://github.com/elichai/syscalls-rs

To emphasize. I don't want just a musl rewrite in rust with big blobs of unsafe. My whole point is to extend the safety because: A. libstd dev's can also make mistakes, so let's get them a safer API like what they're doing for the users in all other aspects. B. A lot of users in the wild are calling libc, we can make this safer. C. I hope that one day we'll get memory safety through the whole stack, and I think this is the next stage for rust.

elichai avatar Oct 18 '19 08:10 elichai

I'm writing a rust libc implementation.

To me “libc implementation” means something specific. It provides C APIs as defined in the C standard and POSIX, and tries to be compatible with other implementations in the same way that musl and glibc are compatible with each other. relibc is a libc implementation written in Rust.

It seems that you don’t mean that, but rather something that “spiritually” fills a role similar to that of libc in communicating with the kernel through syscalls and providing higher-level abstractions?

A lot of users in the wild are calling libc, we can make this safer.

Or do you mean something used not instead of libc, but an abstraction on top of it?

how will that affect FFI to C?

As far as I can tell, steed and this issue are about programs that do not use C at all. As soon as you have C code that code needs a libc implementation (as defined above), so there isn’t much point in avoiding it in libstd, in my opinion.

SimonSapin avatar Oct 18 '19 10:10 SimonSapin

I'm talking about a rust replacement for libc. I'm not planing of exposing FFI functions to C. The point is to have something like "libr" meaning all the libc functionality but for rust. because we're not C. So currently I'm trying to give the same functions but have rust safer function signatures to them.

I hope that in the long run it will prove itself safer and overall better. then we'd want to use it in libstd by default even if you still link to C code that'll use libc.

elichai avatar Oct 18 '19 10:10 elichai

If you’re changing signatures anyway, it doesn’t have to stay close to “the same functions” at all. Do you need any well-defined abstraction between libstd’s public API and syscalls? Why shouldn’t std::fs::File be the safer signature for open(2)?

SimonSapin avatar Oct 18 '19 10:10 SimonSapin

It could, although open is used directly also in flock::Lock. but this is just 1 syscall. there are some syscalls scattered all over libstd. this could be a simple drop in replacement for most of them. (maybe a bit more gymnastics for flags etc.)

and the point is levels of abstractions, you can have a function that calls directly 4 syscalls and handles all their errors etc. and then say that that function is the safer api. but for good better abstractions it's better to split each syscall and the handling of the data/errors to it's own file. that way it's also way easier to review carefully.

elichai avatar Oct 18 '19 10:10 elichai

If you’re changing signatures anyway, it doesn’t have to stay close to “the same functions” at all. Do you need any well-defined abstraction between libstd’s public API and syscalls? Why shouldn’t std::fs::File be the safer signature for open(2)?

There is also a lot of stuff you can't do with rusts stdlib at the moment, some which I've needed recently are:

  • acquiring file locks
  • setting ecn on udp packets
  • dropping capablitities
  • signal handling
  • epoll

While there are libraries doing these things, they use the libc crate and are usually plastered with unsafe code.

dvc94ch avatar Oct 18 '19 12:10 dvc94ch

The point is to have something like "libr" meaning all the libc functionality but for rust. because we're not C. So currently I'm trying to give the same functions but have rust safer function signatures to them.

I think it makes a lot of sense... although it would be nicer if that hypothetical libr was linkable from non-rust binaries too. The problem here is that a lot of time will pass until we a stable ABI, and this constrains a lot the kind of interfaces that we could expose in that libr until then.

As a side note, the name libr was taken 5 years ago, for a binding to the language R. Even more sadly, that library seems abandoned. There's another (very small) library librs that seems to go in that direction, but probably abandoning the possibility of being linkable from non-rust code.

castarco avatar Oct 27 '20 08:10 castarco

In summary the options are:

  1. Create an entirely new layer for libstd to use. This layer can be written as if its sole consumer is libstd, so it needs to provide "just enough" / exactly what libstd needs The layer can reside outside of libstd (no_std) or built into libstd. libstd can be ported to this.

  2. Create a drop-in libc replacement implemented entirely no_std. libstd will then use this and other applications can use this as well. In addition this can provide a C-linkable ABI, so it could also serve as a drop-in replacement for libc. Stretch goal.

  3. I'm adding a sort of hybrid of the above two here: implement the subset of libc that libstd uses in pure no_std rust. The API does not have to match C's exactly (but still drop-in), it can just meet what libstd is expecting and using. The list posted by @elichai would serve as a starting point: https://github.com/elichai/syscalls-rs

"Step 0" for any of the above paths is: The machinery for syscalls (discovering what is available, which numbers they are, how to make them) for all supported platforms (CPU + OS). glibc does all sorts of things at build time to generate these. This should really be in a whole separate no_std library. Does something like this exist already?

Once this no_std_syscall library exists option 1/3 can be brought up pretty quickly and evolved into 2 as time passes.

pmmccorm avatar Dec 01 '20 17:12 pmmccorm

I don't think we should copy libc API since it would allow us to remove some of its peculiarities, e.g. errno. Plus it would reduce scope of the work significantly. So the first option is the best one in my opinion.

The machinery for syscalls (discovering what is available, which numbers they are, how to make them) for all supported platforms (CPU + OS). glibc does all sorts of things at build time to generate these.

Aren't syscall numbers fixed per target arch? And since Rust have recently updated supported kernel version, we don't need too much of fallback code (if any). As for OS, IIRC only Linux has stable syscall API.

newpavlov avatar Dec 01 '20 22:12 newpavlov

  1. Create an entirely new layer for libstd to use. This layer can be written as if its sole consumer is libstd, so it needs to provide "just enough" / exactly what libstd needs The layer can reside outside of libstd (no_std) or built into libstd. libstd can be ported to this.

My interest in this is to greatly reduce use of unsafe in the libstd codebase, so the "entirely new layer for libstd to use" approach is the one I prefer.

The machinery for syscalls (discovering what is available, which numbers they are, how to make them) for all supported platforms (CPU + OS). glibc does all sorts of things at build time to generate these. This should really be in a whole separate no_std library. Does something like this exist already?

It depends on the operating system. Not all operating systems provide a stable ABI at the syscall layer. I'm thinking about Windows in particular.

Note also this issue is scoped to Linux currently: "Consider adding Linux targets that don't depend on libc." I don't think we should expand the scope to other operating systems.

briansmith avatar Dec 02 '20 01:12 briansmith

My interest in this is to greatly reduce use of unsafe in the libstd codebase, so the "entirely new layer for libstd to use" approach is the one I prefer.

Yep, that's also my interest, reducing unsafe code + getting rid of safety minefields that libc has (especially around vargs handling). this is why I've started: https://github.com/elichai/syscalls-rs, which gives a rust-like API directly to syscalls , but sadly I don't have a lot of time lately, hope to come back to this (also, reimplementing pthreads will not be an easy task)

elichai avatar Dec 02 '20 08:12 elichai

I'm working on a project which opens up another option: rsix.

It's similar to syscalls-rs mentioned above in that it can use native-linux syscalls, but it's structured to have configurable backends, with raw linux syscalls and libc being the two current options. And, while it has inline asm that it uses on nightly Rust, it also has out-of-line asm for stable Rust as well. So for the functions it supports, crates can use it today as a drop-in replacement for libc, because it supports all the same platforms libc does, using native syscalls when it can, and libc itself when it can't.

And, it's built around some novel features, notably OwnedFd/BorrowedFd in place of RawFd (see the I/O safety RFC for background).

These properties enable it to be useful in some real-world use cases today, which will ideally will help it grow in a sustainable way.

sunfishcode avatar Jul 14 '21 22:07 sunfishcode