aptly icon indicating copy to clipboard operation
aptly copied to clipboard

New command to purge old versions

Open dankegel opened this issue 8 years ago • 22 comments

Here's a script that claims to call aptly repo remove once for each package with older versions that need removing, for a particular architecture. It relies on gnu sort's -V option, which sorts first by package name then by package version. It's really ugly, but it illustrates that "purge old versions" is nontrivial and might be worth adding as a feature in aptly itself.

#!/bin/sh
set -x
set -e
repo=_my_repo_
arch=amd64

dup=false
for p in `aptly repo search $repo "Architecture ($arch)" | sed "s/_$arch//" | sort -V`
do
    pkg=`echo $p | sed 's,_.*,,'`
    if test "$pkg" = "$pkg_old"
    then
        dup=true
    elif $dup
    then
        dup=false
        # $p_old is latest version of some package with more than one version
        # Output a search spec for all versions older than this
        # Version is 2nd field in output of aptly repo search, separated by _
        v_old=`echo $p_old | cut -d_ -f2`
        aptly repo remove $repo "$pkg_old (<< $v_old), Architecture ($arch)"
    fi
    p_old="$p"
    pkg_old="$pkg"
done

dankegel avatar Aug 28 '15 18:08 dankegel

For the automatic package publishing system I'm setting up (for a local Debian repository), this feature would be very useful, especially over long periods of time, as the CI/build server will churn out many versions of the packages.

Castaglia avatar Oct 16 '15 17:10 Castaglia

:+1: I would love something like this.

jlu5 avatar Oct 18 '15 03:10 jlu5

This would be great to have implemented into aptly.

iGuy5 avatar Nov 07 '15 20:11 iGuy5

On top of that, this feature will be more useful if it allows the user to specify the amount of old packages to keep.

rul avatar Jan 14 '16 18:01 rul

Selecting how much history to keep is a toughie. Three possibilities come to mind: max # of versions, max age, and max total bytes for all versions of a package. That might handle a lot of use cases, especially if they could be combined.

dankegel avatar Jan 14 '16 18:01 dankegel

For my particular use cases, either max # of versions or max age would work.

Castaglia avatar Jan 14 '16 19:01 Castaglia

I've come up with something like this:

# Removes old packages in the received repo
#
# $1: Repository
# $2: Architecture
# $3: Amount of packages to keep
repo-remove-old-packages() {
    local repo=$1
    local arch=$2
    local keep=$3

    for pkg in $(aptly repo search $repo "Architecture ($arch)" | grep -v "ERROR: no results" | sort -rV); do
        local pkg_name=$(echo $pkg | cut -d_ -f1)
        if [ "$pkg_name" != "$cur_pkg" ]; then
            local count=0
            local deleted=""
            local cur_pkg="$pkg_name"
        fi
        test -n "$deleted" && continue
        let count+=1
        if [ $count -gt $keep ]; then
            pkg_version=$(echo $pkg | cut -d_ -f2)
            aptly repo remove $repo "Name ($pkg_name), Version (<= $pkg_version)"
            deleted='yes'
        fi
    done
}

Note that the grep -v "ERROR: no results" is due #334.

rul avatar Jan 14 '16 22:01 rul

Issue with error messages going to stdout had been fixed already in master.

smira avatar Jan 15 '16 12:01 smira

It would be nice to have an ability to keep some fixed number of versions let's say I want last 10 versions only, so I can roll back to some of them, but do not need to keep all of them.

something like (I know it is ugly, and this is ad-hoc one-liner) :

version=`aptly repo remove -dry-run=true $repo $package | sort --version-sort  | grep $package |   tail -n $number_to leave  | head -1 | awk -F"_" '{print $2}'` 
aptly repo remove $repo  "$package ( << $version)"

UPD: just have noticed mistake in version filter

stumyp avatar Feb 23 '16 02:02 stumyp

All other repository managers automatically expire old versions on upload of a new version - e.g. if I upload foo_1.0-2 then foo_1.0-1 is removed. aptly should at least optionally behave like this.

directhex avatar Jun 10 '16 13:06 directhex

Hello, elaborating on @stumyp bash combo I created a Python script which performs (IMHO) the exact behaviour we'd like (I found a some issues with the bash version):

#!/usr/bin/env python2.7
import sys
from subprocess import check_output
from apt_pkg import version_compare, init_system

init_system()

repo = sys.argv[1]
package_name = sys.argv[2]
retain_how_many = int(sys.argv[3])

output = check_output(["aptly", "repo", "remove", "-dry-run=true", repo, package_name])
output = [line for line in output.split("\n") if line.startswith("[-]")]
output = [line.replace("[-] ","") for line in output]
output = [line.replace(" removed","") for line in output]

def sort_cmp(name1, name2):
    version_and_build_1 = name1.split("_")[1]
    version_and_build_2 = name2.split("_")[1]
    return version_compare(version_and_build_1, version_and_build_2)

output.sort(cmp=sort_cmp)
should_delete = output[:-retain_how_many]

if should_delete:
    print check_output(["aptly", "repo", "remove", repo] + should_delete)
else:
    print "nothing to delete"

Since it's already in Python, if @smira is interested I could try submitting a pull request for integrating such functionality in aptly itself; any idea in how you'd like to command line? I'd probably create an "aptly repo" subcommand.

alanfranz avatar Oct 12 '16 07:10 alanfranz

One of the issues that frequently pops up is that when one changes the version scheme or the package name, everything gets borked and all the old packages must be removed (sometimes). Can it deal with inconsistently versioned or named packages somehow?

figtrap avatar Nov 07 '16 16:11 figtrap

Debian package versioning lets package maintainers cope with changing upstream version schemes by prefixing the version number with an epoch; see http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html

Because the python script uses from apt_pkg import version_compare to do its version comparisons, it's likely to handle that correctly.

On Mon, Nov 7, 2016 at 8:02 AM, figtrap [email protected] wrote:

One of the issues that frequently pops up is that when one changes the version scheme or the package name, everything gets borked and all the old packages must be removed (sometimes). Can it deal with inconsistently versioned or named packages somehow?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smira/aptly/issues/291#issuecomment-258876651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKb4HDzMEa3CG1ZSsWFrBjSo4-aOz28ks5q70uEgaJpZM4F0QXZ .

dankegel avatar Nov 07 '16 16:11 dankegel

Thank you, I totally forgot about the epoch.

Tim Kelley

On Mon, Nov 7, 2016 at 10:19 AM, Dan Kegel [email protected] wrote:

Debian package versioning lets package maintainers cope with changing upstream version schemes by prefixing the version number with an epoch; see http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html

Because the python script uses from apt_pkg import version_compare to do its version comparisons, it's likely to handle that correctly.

On Mon, Nov 7, 2016 at 8:02 AM, figtrap [email protected] wrote:

One of the issues that frequently pops up is that when one changes the version scheme or the package name, everything gets borked and all the old packages must be removed (sometimes). Can it deal with inconsistently versioned or named packages somehow?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smira/aptly/issues/291#issuecomment-258876651, or mute the thread <https://github.com/notifications/unsubscribe-auth/ AAKb4HDzMEa3CG1ZSsWFrBjSo4-aOz28ks5q70uEgaJpZM4F0QXZ> .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smira/aptly/issues/291#issuecomment-258882014, or mute the thread https://github.com/notifications/unsubscribe-auth/AOhtauQLXabKhz3ii79gEZzqCpJSu3d8ks5q70-tgaJpZM4F0QXZ .

figtrap avatar Nov 08 '16 15:11 figtrap

I added a few things to the script of @alanfranz, now it is possible to use package queries to remove old versions.

Example call:

./purge_old_versions.py --dry-run --repo release-repo --package-query 'Name (% ros-indigo-*)' -n 1
#!/usr/bin/env python
from __future__ import print_function

import argparse
import re
import sys

from apt_pkg import version_compare, init_system
from subprocess import check_output, CalledProcessError


class PurgeOldVersions:
    def __init__(self):
        self.args = self.parse_arguments()

        if self.args.dry_run:
            print("Run in dry mode, without actually deleting the packages.")

        if not self.args.repo:
            sys.exit("You must declare a repository with: --repo")

        if not self.args.package_query:
            sys.exit("You must declare a package query with: --package-query")

        print("Remove " + self.args.package_query + " from " + self.args.repo +
              " and keep the last " + str(self.args.retain_how_many) +
              " packages")

    @staticmethod
    def parse_arguments():
        parser = argparse.ArgumentParser(
            formatter_class=argparse.RawTextHelpFormatter)
        parser.add_argument("--dry-run", dest="dry_run",
                            help="List packages to remove without removing "
                                 "them.", action="store_true")
        parser.add_argument("--repo", dest="repo",
                            help="Which repository should be searched?",
                            type=str)
        parser.add_argument("--package-query", dest="package_query",
                            help="Which packages should be removed?\n"
                                 "e.g.\n"
                                 "  - Single package: ros-indigo-rbdl.\n"
                                 "  - Query: 'Name (%% ros-indigo-*)' "
                                 "to match all ros-indigo packages. See \n"
                                 "https://www.aptly.info/doc/feature/query/",
                            type=str)
        parser.add_argument("-n", "--retain-how-many", dest="retain_how_many",
                            help="How many package versions should be kept?",
                            type=int, default=1)
        return parser.parse_args()

    def get_packages(self):
        init_system()

        packages = []

        try:
            output = check_output(["aptly", "repo", "remove", "-dry-run=true",
                                   self.args.repo, self.args.package_query])
            output = [line for line in output.split("\n") if
                      line.startswith("[-]")]
            output = [line.replace("[-] ", "") for line in output]

            for p in output:
                packages.append(
                    re.sub("[_](\d{1,}[:])?\d{1,}[.]\d{1,}[.]\d{1,}[-](.*)", '', p))
            packages = list(set(packages))
            packages.sort()

        except CalledProcessError as e:
            print(e)

        finally:
            return packages

    def purge(self):
        init_system()

        packages = self.get_packages()
        if not packages:
            sys.exit("No packages to remove.")

        # Initial call to print 0% progress
        i = 0
        l = len(packages)
        printProgressBar(i, l, prefix='Progress:', suffix='Complete', length=50)

        packages_to_remove = []
        for package in packages:
            try:
                output = check_output(["aptly", "repo", "remove",
                                       "-dry-run=true", self.args.repo,
                                       package])
                output = [line for line in output.split("\n") if
                          line.startswith("[-]")]
                output = [line.replace("[-] ", "") for line in output]
                output = [line.replace(" removed", "") for line in output]

                def sort_cmp(name1, name2):
                    version_and_build_1 = name1.split("_")[1]
                    version_and_build_2 = name2.split("_")[1]
                    return version_compare(version_and_build_1,
                                           version_and_build_2)

                output.sort(cmp=sort_cmp)
                should_delete = output[:-self.args.retain_how_many]
                packages_to_remove += should_delete

                i += 1
                printProgressBar(i, l, prefix='Progress:', suffix='Complete',
                                 length=100)

            except CalledProcessError as e:
                print(e)

        print(" ")
        if self.args.dry_run:
            print("\nThis packages would be deleted:")
            for p in packages_to_remove:
                print(p)
        else:
            if packages_to_remove:
                print(check_output(["aptly", "repo", "remove",
                                    self.args.repo] + packages_to_remove))
                print("\nRun 'aptly publish update ...' "
                      "to update the repository.")
            else:
                print("nothing to remove")


# Print iterations progress
def printProgressBar(iteration, total, prefix='', suffix='', decimals=1,
                     length=100, fill='#'):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required  : current iteration (Int)
        total       - Required  : total iterations (Int)
        prefix      - Optional  : prefix string (Str)
        suffix      - Optional  : suffix string (Str)
        decimals    - Optional  : positive number of decimals in percent
                                  complete (Int)
        length      - Optional  : character length of bar (Int)
        fill        - Optional  : bar fill character (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(
        100 * (iteration / float(total)))
    filled_length = int(length * iteration // total)
    bar = fill * filled_length + '-' * (length - filled_length)
    print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end='\r')
    # Print New Line on Complete
    if iteration == total:
        print()


if __name__ == '__main__':
    purge_old_versions = PurgeOldVersions()
    purge_old_versions.purge()

samuelba avatar Jan 31 '17 15:01 samuelba

I had feature in the works which I never got to completion as it requires some large scale changes, but the idea was to enhance package queries with Python-like slice syntax, so that you could do package[3:] which would mean "all the first 3 versions of package".

smira avatar Feb 07 '17 22:02 smira

@samuelba Thanks for the script but it does not work properly with query like Name (% *test), Version(% *dev) it lists all packages for deletion like ignoring the Version filter, normal aptly command works without a problem with such query so i had to revert back to plain old bash hacking

gacopl avatar Apr 26 '17 02:04 gacopl

The feature mentioned by @smira would be tremendously useful for maintaining repositories that can accrue a large number of different versions. I was wondering if there has been any progress on this in the last couple of months?

wwentland avatar Nov 23 '17 11:11 wwentland

No progress so far on that, I have branch which implements part of the syntax, but nothing more.

smira avatar Nov 27 '17 09:11 smira

This thread helped me a lot. Here is my take on the issue based on what I've read here. Hope it will be useful.

#!/usr/bin/env python3
import sys
import json
import codecs
import mimetypes
import uuid
import io
import re
from pathlib import Path
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
from functools import cmp_to_key
from apt_pkg import version_compare, init_system

init_system()


class MultipartFormdataEncoder(object):
    def __init__(self):
        self.boundary = uuid.uuid4().hex
        self.content_type = 'multipart/form-data; boundary={}'.format(self.boundary)

    def iter(self, files):
        encoder = codecs.getencoder('utf-8')
        for file in files:
            print('uploading file %s...' % str(file))
            yield encoder('--{}\r\n'.format(self.boundary))
            yield encoder('Content-Disposition: form-data; name="{}"; filename="{}"\r\n'.format(file.name, file.name))
            yield encoder('Content-Type: {}\r\n'.format(mimetypes.guess_type(file.name)[0] or 'application/octet-stream'))
            yield encoder('\r\n')
            with open(str(file), 'rb') as fd:
                buff = fd.read()
                yield (buff, len(buff))
            yield encoder('\r\n')
        yield encoder('--{}--\r\n'.format(self.boundary))

    def encode(self, files):
        body = io.BytesIO()
        for chunk, chunk_len in self.iter(files):
            body.write(chunk)
        return self.content_type, body.getvalue()


def sort_cmp(p1, p2):
    v1 = p1.split(' ')[2]
    v2 = p2.split(' ')[2]
    return version_compare(v1, v2)


def request(url, method='GET', data=None, files=None):
    headers = {'Content-Type': 'application/json'}

    if data is not None:        
        data = json.dumps(data).encode('utf-8')

    if files is not None:
        content_type, data = MultipartFormdataEncoder().encode(files)
        headers = {'Content-Type': content_type}

    req = Request(url, data, headers)
    req.get_method = lambda: method
    try:
        response = urlopen(req)
    except HTTPError as e:
        print('the server couldn\'t fulfill the request.')
        print('error code: ', e.code)
    except URLError as e:
        print('failed to reach a server.')
        print('reason: ', e.reason)
    else:
        rep = json.loads(response.read().decode('utf-8'))
        return rep


def purge(url, repo, name, retain_how_many):    
    data = request(url+'/api/repos/'+repo+'/packages')
    data = list(filter(lambda x: x.split(' ')[1]==name, data))
    data = sorted(data, key=cmp_to_key(sort_cmp))
    should_delete = data[:-retain_how_many]

    if should_delete:
        print('the following packages are going to be removed from %s: %s' % (repo, should_delete))
        data = {'PackageRefs': should_delete}
        rep = request(url+'/api/repos/'+repo+'/packages', method='DELETE', data=data)
    else:
        print('no version of %s deleted in %s' % (name, repo))


def main():
    url = sys.argv[1]
    repo_pattern = re.compile(sys.argv[2])
    package_glob = sys.argv[3]
    retain_how_many = int(sys.argv[4])
    directory = str(uuid.uuid4())

    # Upload packages
    packages = list(Path('.').glob(package_glob))
    print('uploading %s packages in directory %s' % (len(packages), directory))
    request(url+'/api/files/'+directory, method='POST', files=packages)

    # List repos matching repo_pattern
    repos = [r['Name'] for r in request(url+'/api/repos')]
    repos = [r for r in repos if repo_pattern.match(r)]
    print("pattern matches the following repositories: %s" % repos)

    names = {file.name.split('_')[0] for file in packages}
    for repo in repos:
        # Add package to repo
        rep = request(url+'/api/repos/'+repo+'/file/'+directory+'?noRemove=1', method='POST')
        # Delete old package
        for name in names:
            purge(url, repo, name, retain_how_many)

    # Delete upload directory
    request(url+'/api/files/'+directory, method='DELETE')

if __name__ == '__main__':
    main()

Usage: ./aptly-push <http://APTLYAPI> <REPOPATTERN> <PATH> <RETAINHOWMANY>

It will upload all packages matching the PATH glob and add them to all the repos matching the REPOPATTERN. For each repo and for each package, it then limits the number of versions to RETAINHOWMANY.

Example: ./aptly-push http://127.0.0.1:9876 "myrepo-(?:prod|staging)" "./build/*.deb" 3

fyhertz avatar Jul 08 '19 08:07 fyhertz

And yet another version, based on the version of @samuelba We have 8 repos with each 2 or 4 components, 3 to 4 architectures and some 100 packages. While samuelbas version worked nicely (after porting from python2 to python3) it took about 10 minutes to purge all of them. So instead of painting a progress bar, this one should be fast enough to not need one :)

#!/usr/bin/env python3
from __future__ import print_function

import argparse
import re
import sys

from apt_pkg import version_compare, init_system
from subprocess import check_output, CalledProcessError
from functools import cmp_to_key


class PurgeOldVersions:
    def __init__(self):
        self.args = self.parse_arguments()

        if self.args.dry_run:
            print("Running in dry mode, without actually deleting the packages.")

        if not self.args.repo:
            sys.exit("You must declare a repository with: --repo")

        if not self.args.package_query:
            sys.exit("You must declare a package query with: --package-query")

        print("Removing " + self.args.package_query + " from " + self.args.repo +
              " and keeping the last " + str(self.args.retain_how_many) +
              " packages")

    @staticmethod
    def parse_arguments():
        parser = argparse.ArgumentParser(
            formatter_class=argparse.RawTextHelpFormatter)
        parser.add_argument("--dry-run", dest="dry_run",
                            help="List packages to remove without removing "
                                 "them.", action="store_true")
        parser.add_argument("--repo", dest="repo",
                            help="Which repository should be searched?",
                            type=str)
        parser.add_argument("--package-query", dest="package_query",
                            help="Which packages should be removed?\n"
                                 "e.g.\n"
                                 "  - Single package: ros-indigo-rbdl.\n"
                                 "  - Query: 'Name (%% ros-indigo-*)' "
                                 "to match all ros-indigo packages. See \n"
                                 "https://www.aptly.info/doc/feature/query/",
                            type=str)
        parser.add_argument("-n", "--retain-how-many", dest="retain_how_many",
                            help="How many package versions should be kept?",
                            type=int, default=1)
        return parser.parse_args()

    def get_packages(self):
        init_system()

        packages = {}

        try:
            print("getting packages %s" % self.args.package_query)
            output = check_output(["aptly", "repo", "remove", "-dry-run",
                                   self.args.repo, self.args.package_query]).decode('utf-8')
            output = [line for line in output.splitlines() if
                      line.startswith("[-]")]
            output = [line.replace("[-] ", "") for line in output]
            output = [line.replace(" removed", "") for line in output]

            for p in output:
                packageName = p.split("_")[0]
                version = p.split("_")[1]
                arch = p.split("_")[2]
                if packageName not in packages:
                    packages[packageName] = {}
                if arch not in packages[packageName]:
                    packages[packageName][arch] = []
                packages[packageName][arch].append(version)


        except CalledProcessError as e:
            print(e)

        finally:
            return packages

    def purge(self):
        init_system()

        packages = self.get_packages()

        packagesToRemove = []

        for package in packages:
            for arch in packages[package]:
                versions = packages[package][arch]

                versions = sorted(versions, key=cmp_to_key(version_compare))
                versionsToRemove = versions[:-self.args.retain_how_many]
                for versionToRemove in versionsToRemove:
                    packagesToRemove.append("%s_%s_%s" % (package, versionToRemove, arch))

        if len(packagesToRemove) == 0:
            sys.exit("No packages to remove.")

        if self.args.dry_run:
            print(check_output(["aptly", "repo", "remove", "-dry-run", self.args.repo] + packagesToRemove).decode("utf-8"))
        else:
            print(check_output(["aptly", "repo", "remove", self.args.repo] + packagesToRemove).decode("utf-8"))


if __name__ == '__main__':
    purge_old_versions = PurgeOldVersions()
    purge_old_versions.purge()

mzanetti avatar Apr 03 '22 22:04 mzanetti

Could we get some guidance on the requirements for what would be required for a 3rd party to implement this.

james-lawrence avatar Feb 22 '24 12:02 james-lawrence