libdnf icon indicating copy to clipboard operation
libdnf copied to clipboard

`Subject.get_best_query`'s `with_filenames` argument is not honored

Open gotmax23 opened this issue 1 year ago • 1 comments

Since the latest libdnf update, hawkey.Subject.get_best_query does not honor the with_filenames argument. Filenames are still matched even if with_filenames is explicitly set to False.

import dnf
import dnf.subject
import fedrq.config
import hawkey


def get_base() -> dnf.Base:
    """
    Use fedrq to generate a Base object with filelists loaded
.
    This can also be done manually.
    """
    base = (
        fedrq.config.get_config(backend="dnf", load_filelists="always")
        .get_rq(branch="rawhide")
        .base
    )
    assert "filelists" in base.conf.optional_metadata_types
    assert isinstance(base, dnf.Base)
    return base


def main() -> None:
    base = get_base()
    print("Hawkey version:", hawkey.VERSION)
    print("dnf version:", dnf.VERSION)
    path = "/usr/share/ansible"
    # Filenames are not enabled for the query
    subject = dnf.subject.Subject(path).get_best_query(base.sack, with_filenames=False)
    # Query length should be 0
    print("Query length:", len(subject))
    # for package in subject:
    #     print(package)


if __name__ == "__main__":
    main()

$ python with_filelists.py
Hawkey version: 0.73.0
dnf version: 4.19.0
Query length: 0
$ python with_filelists.py 
Hawkey version: 0.73.3
dnf version: 4.21.1
Query length: 3

This is breaking fedrq's unit tests that check that our wrapper of the Subject API returns the correct results when with_filenames is disabled.

gotmax23 avatar Aug 27 '24 20:08 gotmax23

After a bit of experimenting - this regression is a side effect of the patch https://github.com/rpm-software-management/libdnf/pull/1670. The string in question /usr/share/ansible is found during search in provides https://github.com/rpm-software-management/libdnf/blob/fbd34472e5edea0a3ed88ae14a47666e23538e4a/libdnf/sack/query.cpp#L2751-L2757

This causes the query is returned even before the string was searched in file lists.

It looks like POOL_FLAG_ADDFILEPROVIDESFILTERED causes that when filelists metadata are loaded, they are always added to provides. @kontura can you please take a look?

m-blaha avatar Aug 28 '24 10:08 m-blaha

The change in behavior with POOL_FLAG_ADDFILEPROVIDESFILTERED is unintended. I changed the way libsolv sets up the lazy fileprovides lookup so that it works like before. See https://github.com/openSUSE/libsolv/commit/2150865d616ec454375de2c3bcd244c42c56e782

I don't know if this is relevant to the slowness issue.

mlschroe avatar Sep 18 '24 12:09 mlschroe

The change in behavior with POOL_FLAG_ADDFILEPROVIDESFILTERED is unintended. I changed the way libsolv sets up the lazy fileprovides lookup so that it works like before. See openSUSE/libsolv@2150865

@mlschroe the linked commit doesn't seem to affect this particular issue.

Though while looking into this I extracted the following snippet:

#include "solv/pool.h"
#include "solv/repo.h"

static const char * input =  "/usr/lib/file";

int main() {
    Id p, pp;
    Solvable * s;

    Pool * pool = pool_create();
    pool_set_flag(pool, POOL_FLAG_ADDFILEPROVIDESFILTERED, 1);

    Repo * repo = repo_create(pool, "repo");
    Repodata * data = repo_add_repodata(repo, 0);

    // add pkg with provide
    s = pool_id2solvable(pool, repo_add_solvable(repo));
    s->name = pool_str2id(pool, "foo", 1);
    s->evr = pool_str2id(pool, "1-2", 1);
    s->arch = pool_str2id(pool, "x86_64", 1);
    Id reldep = pool_rel2id(pool, s->name, s->evr, REL_EQ, 1);
    s->provides = repo_addid_dep(repo, s->provides, reldep, 0);

    // add "/usr/lib/file" file to pkg
    Id did = repodata_str2dir(data, "/usr/lib", 1);
    repodata_add_dirstr(data, s - pool->solvables, SOLVABLE_FILELIST, did, "file");

    repodata_internalize(data);

    pool_addfileprovides(pool);
    pool_createwhatprovides(pool);

    // Get all packages providing `input`

    Id path_id = pool_str2id(pool, input, 1);
    FOR_PROVIDES(p, pp, path_id) {
        s = pool->solvables + p;
        printf("FOR_PROVIDES: %s\n", pool_solvable2str(pool, s));
    }

    pool_free(pool);
    return 0;
}

It is odd to me that while the POOL_FLAG_ADDFILEPROVIDESFILTERED is set /usr/lib/file is found as a provide but without the flag its not. I understand the flag changes how the file entries (that would not be included in the primary file) are added but I would expect the end result to be the same.

I did find another approach that seems to work regardless of the POOL_FLAG_ADDFILEPROVIDESFILTERED:

    int flags;
    Queue q;
    queue_init(&q);
    flags = SELECTION_PROVIDES;
    selection_make(pool, &q, input, flags);

    flags = SELECTION_FILELIST|SELECTION_SUBTRACT;
    selection_make(pool, &q, input, flags);

    Queue pq;
    queue_init(&pq);
    selection_solvables(pool, &q, &pq);

    for (int j = 0; j < pq.count; j++) {
        printf("provides wihtout filelists: %s\n", pool_solvable2str(pool, pool_id2solvable(pool, pq.elements[j])));
    }
    queue_free(&q);

kontura avatar Sep 25 '24 11:09 kontura

Yes, this happens because the path_id is added after the pool_createwhatprovides(() call. Fixed now in commit https://github.com/openSUSE/libsolv/commit/081580de9f964aae61d25e1787c37c7ab8b4fa69

mlschroe avatar Sep 25 '24 11:09 mlschroe

Thanks, that fixes this issue.

kontura avatar Sep 25 '24 13:09 kontura