designs icon indicating copy to clipboard operation
designs copied to clipboard

Simplify RID Model

Open richlander opened this issue 3 years ago • 14 comments
trafficstars

This plan is intended to:

  • Freeze the RID graph.
  • Adopt stable RIDs (no version numbers).
  • Limit runtimes.json to being used by NuGet only (by default).
  • Hosts adopt a simple non-extensible model.
  • Contractualize what linux-x64 means (same for Arm64 and x86).
  • Define linux-x64 vs source-build.

More generally, we waste significant amount of time talking about RIDs every release. This proposal is intended to simplify the most problematic aspect of the RID topic. Once that is resolved, we can think about taking on other RID challenges.

Related to:

  • https://github.com/dotnet/runtime/pull/62942
  • https://github.com/dotnet/runtime/issues/59803
  • https://github.com/dotnet/runtime/issues/65152

Rendered view

richlander avatar Apr 03 '22 18:04 richlander

I think the main impact of the design is that package maintainers must change from using distro-rids for separating native libraries to group them under a portable rid (e.g. linux-x64) and use NativeLibrary to load the appropriate native library.

(In practice) we'll also loose the ability to trim for a distro-specific rid. This came for free because the native assets were organized by distro-rid. If the package maintainer needs to maintain a portable rid (linux-x64) which includes native assets for all platforms, there isn't much in it for him to add these assets a second time under the distro rid(, bloating the package,) solely for purpose of trimming. And trimming for the distro-specific rid will only work on the sdk that knows the rid.

tmds avatar Sep 06 '22 12:09 tmds

I think the main impact of the design is that package maintainers must change from using distro-rids for separating native libraries to group them under a portable rid (e.g. linux-x64) and use NativeLibrary to load the appropriate native library.

The problem I see with saying package maintainers have to use this approach is that it is overlooking how publish output changes if you specify a rid or not.

A portable publish will copy all of the rid folders into the output, but a publishing with -r linux-x64 will copy just the content of the package's linux-x64 folder into the main publish output folder.

That makes writing code that can use NativeLibrary to search and find the right binary much more complicated because you can't always assume there is a linux-64 folder in the project output to be searching in.

bording avatar Sep 06 '22 17:09 bording

My comment was to highlight the complexity that gets pushed towards the package maintainers.

Note that if your library has different native libraries for glibc Linux distros, in order to make -r linux-x64 portable (that is: work across that range of distros), you already need to use NativeLibrary to select the appropriate native library.

With this proposal, in the case the user doesn't specify a rid, the library now also needs to use NativeLibrary because host will no longer do the distro rid-graph based selection.

tmds avatar Sep 06 '22 20:09 tmds

My comment was to highlight the complexity that gets pushed towards the package maintainers.

:+1: And that is what I've been somewhat trying to push back on through my comments here. Or at least acknowledgement that this is understood and documented appropriately on how to author packages in this more complicated way.

Note that if your library has different native libraries for glibc Linux distros, in order to make -r linux-x64 portable (that is: work across that range of distros), you already need to use NativeLibrary to select the appropriate native library.

With this proposal, in the case the user doesn't specify a rid, the library now also needs to use NativeLibrary because host will no longer do the distro rid-graph based selection.

And my previous comment was trying to point out that writing this NativeLibrary code is much harder with this proposal because of needing to handle two different cases, which I don't think I've seen acknowledged/understood on this PR yet.

bording avatar Sep 06 '22 20:09 bording

And my previous comment was trying to point out that writing this NativeLibrary code is much harder with this proposal because of needing to handle two different cases, which I don't think I've seen acknowledged/understood on this PR yet.

It depends.

If you know the names of the libraries, you can pass them to NativeLibrary and it will probe the paths that have the native libraries. Then the code is the same.

If you want to iterate yourself the directories being searched to discover at runtime what native libraries are available (e.g. based on a naming pattern), then there is no way to get these directories. They could be exposed through a property on NativeLibrary for example.

tmds avatar Sep 07 '22 04:09 tmds

It depends.

If you know the names of the libraries, you can pass them to NativeLibrary and it will probe the paths that have the native libraries. Then the code is the same.

If you want to iterate yourself the directories being searched to discover at runtime what native libraries are available (e.g. based on a naming pattern), then there is no way to get these directories. They could be exposed through a property on NativeLibrary for example.

To make sure everyone is talking about the same thing, I want to be clear and describe the scenario. We are talking about a NuGet package that has a distro-specific native dependency.

Currently, the way to do that is to ship a package with as many distro-specific RIDs as needed. For example, the runtimes folder might look something like:

+---alpine-x64
|   \---native
|           libgit2-106a5f2.so
|
+---alpine.3.9-x64
|   \---native
|           libgit2-106a5f2.so
|
+---debian-arm64
|   \---native
|           libgit2-106a5f2.so
|
+---debian.9-x64
|   \---native
|           libgit2-106a5f2.so
|
+---fedora-x64
|   \---native
|           libgit2-106a5f2.so
|
+---linux-x64
|   \---native
|           libgit2-106a5f2.so
|
+---osx
|   \---native
|           libgit2-106a5f2.dylib
|
+---rhel-x64
|   \---native
|           libgit2-106a5f2.so
|
+---ubuntu.16.04-arm64
|   \---native
|           libgit2-106a5f2.so
|
+---ubuntu.18.04-x64
|   \---native
|           libgit2-106a5f2.so
|
+---win-x64
|   \---native
|           git2-106a5f2.dll
|           git2-106a5f2.pdb
|
\---win-x86
    \---native
            git2-106a5f2.dll
            git2-106a5f2.pdb

This is tricky to get right and does require ongoing work to ensure the binaries shipped are comprehensive. It also requires that the RID graph is maintained and accurate.


The proposed alternative here seems to be something like the following instead:

+---linux-x64
|   \---native
|       +---alpine-x64
|       |       libgit2-106a5f2.so
|       |
|       +---debian-x64
|       |       libgit2-106a5f2.so
|       |
|       +---fedora-x64
|       |       libgit2-106a5f2.so
|       |
|       +---rhel-x64
|       |       libgit2-106a5f2.so
|       |
|       \---ubuntu-x64
|               libgit2-106a5f2.so
|
+---osx
|   \---native
|           libgit2-106a5f2.dylib
|
+---win-x64
|   \---native
|           git2-106a5f2.dll
|           git2-106a5f2.pdb
|
\---win-x86
    \---native
            git2-106a5f2.dll
            git2-106a5f2.pdb

With this sort of package layout, it is now on the author to write code that can search inside the runtimes\linux-x64 folder and try to load each binary with NativeLibrary to find one that works. This sort of code would vaguely look something like this.

As things currently stand, I see some problems with this proposal, and it actually isn't possible to author a package like this with the way NuGet works right now.

Problems:

  1. Differences between portable and RID-specific publish output

When you build a project or publish it without a RID specified (dotnet publish), the publish output gets a copy of the package's entire runtimes folder. Code written to use NativeLibrary will need to expect this folder structure when searching for the correct Linux binary to load, for example subdirectories of runtimes\linux-x64\native.

However, as soon as you publish with a RID specified, the folder structure is collapsed and the contents of the native folder are now top-level in the publish output. Using the proposed folder structure above, with something like dotnet publish -r win-x64, that results in just the git2-106a5f2.dll file being in the publish output.

What happens with dotnet publish -r linux-x64? You might assume that it would copy the contents of the native folder and maintain the folder structure defined in the package.

If this was true, how are you supposed to author the NativeLibrary loading code that in one case needs to look for binaries in runtimes\linux-x64\native and in another instance needs to be looking for files\folders in the root directory? Is there some way at runtime to know if you're running from RID-specific published content or not? The only thing I can think of at the moment would be to check and see if a runtimes folder exists or not, but that also has an ongoing maintenance burden of having to maintain the list of distro folders might be in the application root to search.

However, NuGet does not actually seem to work this way currently.

  1. NuGet won't honor the proposed folder structure

When you publish with -r linux-x64, instead of just copying the contents of runtimes\linux-x64\native into the publish output, you get an error:

error NETSDK1152: Found multiple publish output files with the same relative path

It is complaining about seeing more than one copy of the binary as if it's trying to put all of them into the publish folder instead of maintaining the defined folder structure.

If you instead try to put the distro subfolders directly under runtimes\linux-x64 and not inside the native folder, that makes things worse. NuGet now completely ignores the linux-x64 folder and it doesn't show up inside publish output at all.

You could potentially work around the NETSDK1152 error by naming every single binary differently, but that's not always under your control. It also doesn't solve the NativeLibrary problem, and in some ways seems to make it worse.


Ultimately, my goal with all of this is to ensure that whatever change ends up happening is done with an awareness that work needs to be done to continue to support packages that need to have distro-specific native binaries. The burden can't be shifted entirely to package maintainers, and in fact the way things currently work make that nearly impossible.

bording avatar Sep 07 '22 23:09 bording

The proposed alternative here seems to be something like the following instead:

The proposed alternative here is the following structure instead:

+---linux-x64
|   \---native
|       +---linux-musl-x64
|       |       libgit2-106a5f2.so
|       |
|       +---libgit2-106a5f2.so
|
+---osx
|   \---native
|           libgit2-106a5f2.dylib
|
+---win-x64
|   \---native
|           git2-106a5f2.dll
|           git2-106a5f2.pdb
|
\---win-x86
    \---native
            git2-106a5f2.dll
            git2-106a5f2.pdb

where libgit2-106a5f2.so is built in prescribed environment with specific glibc and musl versions so that the binary runs on all Linux versions supported by the .NET runtime that the nuget package is targeting. It is a direct equivalent of the approach used by Python to solve this problem: https://peps.python.org/pep-0600/ .

jkotas avatar Sep 07 '22 23:09 jkotas

@jkotas I don't see how that is meaningfully different from what I had in my post. And it doesn't solve any of the problems I described.

This isn't just about glibc vs musl. This is about any sort of native dependency that is different per distro. For example, OpenSSL.

bording avatar Sep 07 '22 23:09 bording

The assumption in this proposal is that it is just about glibc vs. musl for the vast majority of native libraries bundled into packages. The Python solution is based on the same assumption.

For OpenSSL 2 vs. 3 problem, you can build one version of the .so that dynamically binds to OpenSSL2 or 3. It is what .NET runtime itself is doing. libgit2 mirrored that solution as well (https://github.com/libgit2/libgit2/blob/main/src/libgit2/streams/openssl_dynamic.c). Or you can build two version of the so like libgit2_openssl2.so and libgit2_openssl3.so and implement a loader that loads the right one at runtime.

If you have a complex library with a many different dependencies, you can do a custom loader that picks up the right flavor for given distro. Yes, it is complex. The assumption is that only a small fraction of packages would need to do something like this.

jkotas avatar Sep 07 '22 23:09 jkotas

This code works both when publishing with or without a rid.

foreach (var nativeLibrary in new[] { "mylib-opensslv3.so", "mylib-opensslv2.so", "mylib-opensslv1.so" })
{
    if (NativeLibrary.TryLoad(nativeLibrary, typeof(SomeType).Assembly, DllImportSearchPath.ApplicationDirectory, out handle)
       break;
}

This sort of code would vaguely look something like this.

If there are use-cases like this, which require looking at the included native libraries, the directories could be made available through an API.

foreach (var nativeLibraryDir in NativeLibrary.NativeLibraryDirectories)
{
    foreach (var nativeLibrary in Directory.GetFiles(nativeLibraryDir, "mylib-*.so"))
    {
        if (NativeLibrary.TryLoad(nativeLibrary, typeof(SomeType).Assembly, DllImportSearchPath.ApplicationDirectory, out handle)
           break;
    }
}

The downsides to this design are:

  • NativeLibrary handling is now needed when there are multiple native libraries for a single portable rid.
  • Trimming on a non-portable rid is no longer possible.

The upside of the NativeLibrary handling is that libraries will work beyond the known rids of the runtime graph.

The assumption is that only a small fraction of packages would need to do something like this.

Are there some numbers that tell us how many packages on nuget.org have native libraries for non-portable rids?

tmds avatar Sep 08 '22 07:09 tmds

Are there some numbers that tell us how many packages on nuget.org have native libraries for non-portable rids?

Yes, that would be useful data to have, together with the usage numbers.

I believe that people tend to stay away from packages with native dependencies on Linux today since they are broken too often. For example, we had to drop the dependency on LibGit2Sharp in source link (https://github.com/dotnet/sourcelink/pull/288) and reimplement it in C#. One of the reasons was that the distro-specific libgit2 builds caused too many problems.

jkotas avatar Sep 08 '22 16:09 jkotas

This issue seems to be focused on relieving of the complexity of having to manage various Linux distros and their versions.

What about other OSes that are completely different from Linux in ways that are not only related to native library ABI, but also managed system library code?

https://github.com/dotnet/runtime/pull/90695, which seems to be related to this issue, is currently breaking managed library builds for Haiku, since the official .NET SDK is not aware of Haiku and the build is now configured to ignore the runtimes.json in the repository.

Unlike different Linux distros like Ubuntu and Alpine, which can use the same IL code as any linux-x64 in most cases, Haiku is as different from Linux as FreeBSD is when it comes to managed libraries, so wouldn't it make sense to still allow new RIDs for new OS support, but not for new Linux distros?

trungnt2910 avatar Aug 20 '23 03:08 trungnt2910

The intent of this proposal is to move away from OS flavor- and version-specific targeting. This is most notable for Linux because there are so many versions and flavors, but it's equally true for Windows and Mac, where you will no longer be able to target versions individually.

However, different OSes are still classified as different portable RIDs. A portable RID is essentially <baseOS>[-optional-libc]-arch. Since Haiku is a different base OS, it would be eligible for its own. It would just take a community member or members willing to port the runtime to Haiku and maintain compatibility. Since it is not one of the officially supported OSes, Microsoft would not provide support directly. This is similar to the status of FreeBSD.

agocke avatar Aug 20 '23 06:08 agocke

It would just take a community member or members willing to port the runtime to Haiku and maintain compatibility.

https://github.com/dotnet/runtime/pull/86391#discussion_r1247869065

It is being done here, but other members of dotnet are questioning whether a RID for Haiku should be added.

Since Haiku is a different base OS, it would be eligible for its own.

This means unlike what https://github.com/dotnet/runtime/pull/90695 states, the list of RIDs should not be frozen for good, it should just be less volatile since version numbers and distro flavors are not being updated anymore?

trungnt2910 avatar Aug 20 '23 06:08 trungnt2910