dub icon indicating copy to clipboard operation
dub copied to clipboard

feat: Implement support for Github-based index, bypassing the registry

Open Geod24 opened this issue 7 months ago • 5 comments

This implement a package 'index' similar to that found for Homebrew, Nix, Cargo, etc... It allows us to remove a SPOF in our critical infrastructure, as a Github outage would always cause a registry being unusable anyway.

There are multiple steps to having a useful index:
- For transition purpose, we add a hidden command to Dub that export an `index.yaml`;
- In the future, users should register their packages by adding an entry to `index.yaml`, the index definition file of the registry. This is used as the source of all packages;
- `dub` now has a hidden `index-build` command to allow it to build the index based on an index definition file (`index.yaml`). Using this, it queries the various APIs to generate JSON index files that are stored under a pre-defined hierarchy.
- Finally, a `PackageSupplier` is added to make use of this new feature;

In the future, the registration process needs to be moved from the registry to Github to make this migration complete. This *can* be done by exposing a user-friendly interface on `code.dlang.org`, if making an MR to the index is deemed too complicated.

This is still a WIP, albeit quite complete now. Things that still need to be done:

  1. ~Description is not handled properly (needs to be extracted from the recipe file);~
  2. Consider ways to limit / reduce impact on user's disk over a long period (currently uses 32 Mb of data);
  3. We need to have the index in production for a while before enabling it by default for users;
  4. ~Need to switch configy to a real JSON backend as the YAML one doesn't handle strings well.~
  5. Consider scenario where the workspace is empty (e.g. in CI), do we always download a full cache ?
  6. Fetching packages from GitLab and Bitbucket is not yet implemented;
  7. Support for non-global instances of GitLab and Github could be trivially implemented;

FYI @s-ludwig

Geod24 avatar Apr 28 '25 04:04 Geod24

✅ PR OK, no changes in deprecations or warnings

Total deprecations: 0

Total warnings: 0

Build statistics:

 statistics (-before, +after)
-executable size=5055872 bin/dub
-rough build time=61s
+executable size=5511744 bin/dub
+rough build time=65s
Full build output
DUB version 1.39.0, built on Mar 20 2025
LDC - the LLVM D compiler (1.40.1):
  based on DMD v2.110.0 and LLVM 19.1.7
  built with LDC - the LLVM D compiler (1.40.1)
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver3
  http://dlang.org - http://wiki.dlang.org/LDC


  Registered Targets:
    aarch64     - AArch64 (little endian)
    aarch64_32  - AArch64 (little endian ILP32)
    aarch64_be  - AArch64 (big endian)
    amdgcn      - AMD GCN GPUs
    arm         - ARM
    arm64       - ARM64 (little endian)
    arm64_32    - ARM64 (little endian ILP32)
    armeb       - ARM (big endian)
    avr         - Atmel AVR Microcontroller
    bpf         - BPF (host endian)
    bpfeb       - BPF (big endian)
    bpfel       - BPF (little endian)
    hexagon     - Hexagon
    lanai       - Lanai
    loongarch32 - 32-bit LoongArch
    loongarch64 - 64-bit LoongArch
    mips        - MIPS (32-bit big endian)
    mips64      - MIPS (64-bit big endian)
    mips64el    - MIPS (64-bit little endian)
    mipsel      - MIPS (32-bit little endian)
    msp430      - MSP430 [experimental]
    nvptx       - NVIDIA PTX 32-bit
    nvptx64     - NVIDIA PTX 64-bit
    ppc32       - PowerPC 32
    ppc32le     - PowerPC 32 LE
    ppc64       - PowerPC 64
    ppc64le     - PowerPC 64 LE
    r600        - AMD GPUs HD2XXX-HD6XXX
    riscv32     - 32-bit RISC-V
    riscv64     - 64-bit RISC-V
    sparc       - Sparc
    sparcel     - Sparc LE
    sparcv9     - Sparc V9
    spirv       - SPIR-V Logical
    spirv32     - SPIR-V 32-bit
    spirv64     - SPIR-V 64-bit
    systemz     - SystemZ
    thumb       - Thumb
    thumbeb     - Thumb (big endian)
    ve          - VE
    wasm32      - WebAssembly 32-bit
    wasm64      - WebAssembly 64-bit
    x86         - 32-bit X86: Pentium-Pro and above
    x86-64      - 64-bit X86: EM64T and AMD64
    xcore       - XCore
    xtensa      - Xtensa 32
   Upgrading project in /home/runner/work/dub/dub/
    Starting Performing "release" build using /opt/hostedtoolcache/dc/ldc2-1.40.1/x64/ldc2-1.40.1-linux-x86_64/bin/ldc2 for x86_64.
    Building dub 1.39.0-rc.1+commit.54.gcf379ca5: building configuration [application]
     Linking dub
STAT:statistics (-before, +after)
STAT:executable size=5511744 bin/dub
STAT:rough build time=65s

github-actions[bot] avatar Apr 28 '25 04:04 github-actions[bot]

This is what the output looks like for Configy:

 % cat index-build-result/co/yg/configy
{"version":0,"name":"configy","description":"An automatic YAML to struct configuration parser for dlang","source":{"kind":"github","owner":"dlang-community","project":"configy"},"versions":[{"version":"2.1.0","subs":[{"configurations":[{"dependencies":{"dyaml":">=0.8.4"},"name":""},{"name":"library"},{"name":"debug"},{"name":"unittest"}],"path":"dub.json","cache":{"etag":"W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"","last_modified":"Thu, 10 Apr 2025 00:06:18 GMT"},"name":""}],"commit":"f161db12e7f6462959b9f42edd4301a252f13dfe"},{"version":"2.0.0","subs":[{"configurations":[{"dependencies":{"dyaml":">=0.8.4"},"name":""},{"name":"library"},{"name":"debug"},{"name":"unittest"}],"path":"dub.json","cache":{"etag":"W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"","last_modified":"Thu, 10 Apr 2025 00:06:18 GMT"},"name":""}],"commit":"c66665417289da4e8f8ede16a96e8158efd499b5"},{"version":"1.0.0","subs":[{"configurations":[{"dependencies":{"dyaml":">=0.8.4"},"name":""},{"name":"library"},{"name":"debug"},{"name":"unittest"}],"path":"dub.json","cache":{"etag":"W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"","last_modified":"Thu, 10 Apr 2025 00:06:18 GMT"},"name":""}],"commit":"110cc0600324f091773d979284d2948a9ddbb975"}],"cache":{"etag":"W/\"3d57862ee06488642331352dfd274351c5417a254c7c7f0523fab18fee8d9d36\"","last_modified":"Tue, 15 Apr 2025 15:51:57 GMT"}}

Or, pretty-printed:

{
  "version": 0,
  "name": "configy",
  "description": "An automatic YAML to struct configuration parser for dlang",
  "source": {
    "kind": "github",
    "owner": "dlang-community",
    "project": "configy"
  },
  "versions": [
    {
      "version": "2.1.0",
      "subs": [
        {
          "configurations": [
            {
              "dependencies": {
                "dyaml": ">=0.8.4"
              },
              "name": ""
            },
            {
              "name": "library"
            },
            {
              "name": "debug"
            },
            {
              "name": "unittest"
            }
          ],
          "path": "dub.json",
          "cache": {
            "etag": "W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"",
            "last_modified": "Thu, 10 Apr 2025 00:06:18 GMT"
          },
          "name": ""
        }
      ],
      "commit": "f161db12e7f6462959b9f42edd4301a252f13dfe"
    },
    {
      "version": "2.0.0",
      "subs": [
        {
          "configurations": [
            {
              "dependencies": {
                "dyaml": ">=0.8.4"
              },
              "name": ""
            },
            {
              "name": "library"
            },
            {
              "name": "debug"
            },
            {
              "name": "unittest"
            }
          ],
          "path": "dub.json",
          "cache": {
            "etag": "W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"",
            "last_modified": "Thu, 10 Apr 2025 00:06:18 GMT"
          },
          "name": ""
        }
      ],
      "commit": "c66665417289da4e8f8ede16a96e8158efd499b5"
    },
    {
      "version": "1.0.0",
      "subs": [
        {
          "configurations": [
            {
              "dependencies": {
                "dyaml": ">=0.8.4"
              },
              "name": ""
            },
            {
              "name": "library"
            },
            {
              "name": "debug"
            },
            {
              "name": "unittest"
            }
          ],
          "path": "dub.json",
          "cache": {
            "etag": "W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"",
            "last_modified": "Thu, 10 Apr 2025 00:06:18 GMT"
          },
          "name": ""
        }
      ],
      "commit": "110cc0600324f091773d979284d2948a9ddbb975"
    }
  ],
  "cache": {
    "etag": "W/\"3d57862ee06488642331352dfd274351c5417a254c7c7f0523fab18fee8d9d36\"",
    "last_modified": "Tue, 15 Apr 2025 15:51:57 GMT"
  }
}

One way to reduce the bloat would be to have another step to only publish data that is relevant (currently the index stores all the etags / last modified to avoid needlessly querying Github). I would also like to look into package popularity (number of stars / forks, etc...).

Geod24 avatar Apr 28 '25 04:04 Geod24

I can see 103 dead packages:

     Warning The following packages errored out:
        - "dzmq"
        - "pc"
        - "civge"
        - "btreader"
        - "stripe-d"
        - "murmurhash3"
        - "libhell"
        - "interfacing"
        - "m3d"
        - "s3"
        - "d-leveldb-comparator"
        - "libco"
        - "gpgerror-d"
        - "gpgme-d"
        - "bgfx-d"
        - "llvm-d-2"
        - "ansi"
        - "bgfx-extras-d"
        - "liblzma"
        - "iupd"
        - "nukleard"
        - "nluad"
        - "imd"
        - "cdd"
        - "clipboard"
        - "libuid"
        - "soapclient"
        - "mogud-benchmark"
        - "gdal2"
        - "riffedit"
        - "tmarsteel-dpipe"
        - "quantum-random"
        - "yaml-d"
        - "dwtlib"
        - "rdub"
        - "parsed"
        - "gdub"
        - "nice-curses"
        - "dich"
        - "checkit"
        - "big-d"
        - "litecraft-bgfx"
        - "composer"
        - "os1"
        - "kisaragi"
        - "decimal"
        - "dfunkt"
        - "pterm"
        - "struct2mongo"
        - "vibedstruct2mongo"
        - "indexed-relation"
        - "mongo"
        - "rm-rf-exe"
        - "dconfig"
        - "gamenetworkingsockets_d"
        - "derelict-cufft"
        - "sanspam"
        - "ben-eater-8bit-emulator"
        - "discord-d"
        - "psychometry"
        - "firecracker_d"
        - "sml"
        - "evael"
        - "bindbc-assimp"
        - "dunex-auth"
        - "sbylib"
        - "repl-d"
        - "fmt-d"
        - "plist"
        - "grpc-d-core"
        - "grpc-d-interop"
        - "jar"
        - "nudge-d"
        - "webkit2gtkd"
        - "command"
        - "dstruct-orm"
        - "soundpipe-d"
        - "lhl"
        - "pa"
        - "erasure"
        - "jengine"
        - "libcbor"
        - "sweatyballs"
        - "dweb"
        - "feature"
        - "d2asm"
        - "dlsplus"
        - "option"
        - "cli-args"
        - "result"
        - "dlang_raylib"
        - "boxed"
        - "datefmt-redthing1"
        - "econf"
        - "dtiled-redthing1"
        - "faiss-d"
        - "mads"
        - "teacup"
        - "nullable-sugar"
        - "bert-d"
        - "flant5-d"
        - "hellodub"
        - "self"

Geod24 avatar Apr 28 '25 06:04 Geod24

While I'm not opposed to this approach in general, we should set the bar for this quite high:

  • Support for all currently supported platforms (GitHub, GitLab, Bitbucket, Gitea)
  • Support for private repositories
  • Don't regress in terms of usability (e.g. being able to register/verify packages through code.dlang.org)
  • Don't regress in terms of performance (e.g. a lengthy index update to pull in changes)
  • Don't lose additional registry features, such as download statistics
  • Review this in terms of the possibility of backing up package sources to avoid breakage when a package repository disappears

The thing I'm not quite sure about is what we gain by using GitHub to store the list of packages. If that's the only centrally served asset, that should also be trivial to do from a dlang server that is independent of the registry web frontend.

It should be mentioned that we already had a working fallback mechanism with <codemirror.dlang.org> et.al., but at some point along the way that obviously broke. We should definitely get that fixed again and maybe look into improving it (for example, skipping a server that timed out or yielded a 5xx error).

s-ludwig avatar May 15 '25 07:05 s-ludwig

Support for all currently supported platforms (GitHub, GitLab, Bitbucket, Gitea)

I missed Gitea. The rest are supported. Note that there is currently no public package using Gitea. But it shouldn't be hard to add.

Support for private repositories

I think there's multiple questions this raises. Do we want to have private repositories on the public index ? I'd say no. So it's more about supporting different repositories, which this should be able to do, but I haven't extensively tested it yet.

Don't regress in terms of usability (e.g. being able to register/verify packages through code.dlang.org)

Still need to do that, but definitely on the list.

Don't regress in terms of performance (e.g. a lengthy index update to pull in changes)

Agreed - also need to make sure we make it cache / CI friendly.

Don't lose additional registry features, such as download statistics

We could add various metrics to Dub, but also we could rely on Github's metrics / Stars / Forks.

Review this in terms of the possibility of backing up package sources to avoid breakage when a package repository disappears

I don't think this affects our ability to back up packages in any way, positively or negatively. However it makes ownership transfer much easier (because there's no longer a notion of ownership), and thus reviving a dead package no longer needs to involve an administrator.

The thing I'm not quite sure about is what we gain by using GitHub to store the list of packages. If that's the only centrally served asset, that should also be trivial to do from a dlang server that is independent of the registry web frontend.

A lot less to maintain, and a well-known access model.

It should be mentioned that we already had a working fallback mechanism with <codemirror.dlang.org> et.al., but at some point along the way that obviously broke. We should definitely get that fixed again and maybe look into improving it (for example, skipping a server that timed out or yielded a 5xx error).

Agreed we need to improve the client side of thing. But if we can remove most concerns on the server side, that'll be a win in terms of work to be done.

Geod24 avatar May 22 '25 10:05 Geod24