poudriere icon indicating copy to clipboard operation
poudriere copied to clipboard

Add support for hashed mode to Poudriere

Open allanjude opened this issue 4 years ago • 10 comments

Requires the related patches to pkg and the ports tree

allanjude avatar Apr 13 '20 20:04 allanjude

Can you also give more context for what this is?

bdrewery avatar Apr 13 '20 20:04 bdrewery

Can you also give more context for what this is?

When combined with https://github.com/freebsd/pkg/pull/1829

This creates a pkg repo that looks like this:

#ls -al /usr/local/poudriere/data/packages/121amd64-default/All/
total 12021
drwxr-xr-x  2 root    wheel       16 Apr 13 20:06 ./
drwxr-xr-x  4 root    wheel        9 Apr 13 20:06 ../
-rw-r--r--  1 nobody  wheel   161820 Apr 13 19:27 gettext-runtime-0.20.1+ec3887d3a2.txz
lrwxr-xr-x  1 nobody  wheel       37 Apr 13 19:27 gettext-runtime-0.20.1.txz@ -> gettext-runtime-0.20.1+ec3887d3a2.txz
-rw-r--r--  1 nobody  wheel  2527876 Apr 13 19:29 gettext-tools-0.20.1_1+7101c9ffe5.txz
lrwxr-xr-x  1 nobody  wheel       37 Apr 13 19:29 gettext-tools-0.20.1_1.txz@ -> gettext-tools-0.20.1_1+7101c9ffe5.txz
-rw-r--r--  1 nobody  wheel     5828 Apr 13 19:19 indexinfo-0.3.1+1cd9c1a735.txz
lrwxr-xr-x  1 nobody  wheel       30 Apr 13 19:19 indexinfo-0.3.1.txz@ -> indexinfo-0.3.1+1cd9c1a735.txz
-rw-r--r--  1 nobody  wheel   387088 Apr 13 19:27 libtextstyle-0.20.1+6c117ad74e.txz
lrwxr-xr-x  1 nobody  wheel       34 Apr 13 19:27 libtextstyle-0.20.1.txz@ -> libtextstyle-0.20.1+6c117ad74e.txz
-rw-r--r--  1 nobody  wheel   236640 Apr 13 20:04 nano-4.8+74c3c14712.txz
lrwxr-xr-x  1 nobody  wheel       23 Apr 13 20:04 nano-4.8.txz@ -> nano-4.8+74c3c14712.txz
-rw-r--r--  1 nobody  wheel  8787688 Apr 13 20:06 pkg-1.13.99.7.l+46fd66c8a7.txz
lrwxr-xr-x  1 nobody  wheel       30 Apr 13 20:06 pkg-1.13.99.7.l.txz@ -> pkg-1.13.99.7.l+46fd66c8a7.txz
-rw-r--r--  1 nobody  wheel    35452 Apr 13 19:54 zxfer-1.1.7+2e39bad872.txz
lrwxr-xr-x  1 nobody  wheel       26 Apr 13 19:54 zxfer-1.1.7.txz@ -> zxfer-1.1.7+2e39bad872.txz

So when you install a package, it fetches the file with the hash in the URL:

# pkg install zxfer
Updating Test repository catalogue...
Test repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
        zxfer: 1.1.7

Number of packages to be installed: 1

35 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching zxfer-1.1.7+2e39bad872.txz: 100%   35 KiB  35.5kB/s    00:01
Checking integrity... done (0 conflicting)
[1/1] Installing zxfer-1.1.7...
[1/1] Extracting zxfer-1.1.7: 100%

This will allow package repositories to be served from CDNs and web caches, since the unique hash in the filename will avoid the need for cache invalidation on the actual package files, and a short lifetime on the pkg meta files is all that would be required.

allanjude avatar Apr 13 '20 20:04 allanjude

@bapt with https://github.com/freebsd/pkg/commit/36dfb4895b0b307fe3321f6a4478521080a6e9b7 merged into pkg, I've refreshed this patch to add a -H flag to poudriere bulk, which builds a repo using the hashed mode.

It currently implies --symlink as well, because poudriere doesn't find the already build packages during an incremental without it yet.

allanjude avatar Jul 17 '20 02:07 allanjude

The change committed to pkg is different than the original proposal (create hashed filenames during pkg create). The version that was merged to pkg is for pkg repo which does all the work in one step at the end, and requires a lot fewer changes to poudriere that way.

allanjude avatar Jul 17 '20 03:07 allanjude

@allanjude Can we also have a poudriere.conf knob. Below is from my attempt I will write and test an additional commit to work with yours to do that.

# Have pkg create hashed versions of the pkg filenames with symlinks to
# original pkg names. The packagesite.yaml file will point to the hashed version
 # of these files. By using hashed pkg filenames, this allows users to lazily
 # synchronise packages without conflicting with the current packages,
 # for example using rsync or CDNs.  Once the packages are synced the much> 
# smaller meta files can then be synced. Allowing a near atomic update of repo.
 # On caching cdn this means a need to purge 2-5 files instead of all pkgs that
 # have been updated.
 #PKG_HASH=no**

darkfiberiru avatar Sep 16 '20 19:09 darkfiberiru

@allanjude As discussed oob I will try to get a patch version that includes -H flag or poudriere.conf options and generate a new pr/reopen #786

darkfiberiru avatar Sep 18 '20 17:09 darkfiberiru

@darkfiberiru do you still have plans to pick this up again? it seems Allan is perpetually busy with something else

igalic avatar Dec 21 '20 23:12 igalic

One thing I noticed, with the new default config, the pkgs get owned by 'nobody', but the symlink's to the hashed versions are owned by root. Is this a case of pkg repo should be run as nobody, or that we just need to do a chown after pkg repo?

I notice packagesite etc are not owned by nobody.

allanjude avatar Nov 01 '21 02:11 allanjude