mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Improve mamba unsatisfiability error messages

Open AntoinePrv opened this issue 3 years ago • 1 comments

Abstract

Using all_problem_structured, we can improve mamba errors when an environment is not satisfiable. The function gives a number of problem and dependencies between packages involved in the error. We can build a Directed Acyclic Graph (DAG) of these packages, where top level (root) packages are directly requested by the user, and bottom packages (leaves) are posing problem (conflict, missing dependency...).

Together with @syslaila we successfully build two proof of concepts (POC) for such errors:

  • POC-A: One pinpoints the root and leaf packages to produce a succinct error message #1847
  • POC-B: The other retrace the whole path between these root and leaves to show a detailed explanation of the conflicts/problems in a tree-like message AntoinePrv/mamba-error-reporting.

The goal is to incorporate both of them, using a/ as a summary and b/ as the verbose details.

The work done is quite substantial, and currently the two POCs do not share the same foundation / data structures. The aim of this issue is to find similarities between the two POC and break down the work into small, well-defined pull-requests (PRs).

Tasks

Please add unit tests where needed. Python bindings are encouraged when appropriate.

Problem data

  • [ ] T11a: PackageInfo represents (almost) all the available information we have on a package, but to use it in data structure we need to make it comparable and hashable.
    • [x] T11a.1: Optionally, Implement a generic hash for std::tuple (e.g.* from Python, Boost) and a PackageInfo::to_tuple (a generic implementation will be useful for T31 and T32).
    • [x] T11a.2: Implement std::hash<PackageInfo>
    • [x] T11a.3 Implement PackageInfo::operator==
  • [ ] T11b: Alternatively, PackageInfo does loose the solvable_id used by libsolv. If we need to store it, we could.
    • [ ] T11b.1 Build a std::map<SolvableID, PackageInfo> and use the ids everywhere else in the algorithm (no need for 1a. anymore).
  • [x] T12: The information about dependencies only comes as a std::string (e.g. python>3.6.*), however in POC-B we do need to extract the package name from the range.
    • [x] T12.1: Create a simple DependencyInfo struct with name and range attributes
    • [x] T12.2: Create a parsing constructor, (e.g. regex \s*(\w[\w-]*)\s*(.*)\s*)
  • [ ] T13a: Same as T11a for DependencyInfo
  • [ ] T13b: Alternatively, same as T11b for DependencyInfo
  • [x] T14 Parse libsolv data from all_problem_structured
    • [x] T14.1: Filter invalid SOLVER_RULE_JOB
    • [x] T14.2: Build package dependency graph
    • [x] T14.3: [POC-B] Store pair of packages that are in conflict TODO: define where? In the graph or in another data structure)
    • [x] T14.4: [POC-B] Store packages that are problematic (missing dependency etc) TODO: define where? In the graph or in another data structure

Graph data structure

A directed graph data structure already exists in graph_utils.hpp, we should extend it to fit our needs. Having a standalone graph class will improve testing and separation of concerns.

  • [x] T21: Extend the graph class to hold data on each edge
  • [x] T22: Add structure to store and query predecessors
  • [x] T23: Add missing graph exploration algorithms: find_leaves find_roots
  • [ ] T24: [POC-B] Add minimal undirected graph class (?)
  • [ ] T25: [POC-B] Add heuristic clique partition algorithm

Graph compression

  • [x] T31: Add data structure to represent a set of packages
    • [x] T31.1: Likely around a lightweight std::vector, removing duplicates
    • [x] T31.2: Add accessor for getting the name of the package
    • [x] T31.3: Add accessor for getting the all the versions (?)
    • [x] T31.4: Add a versions repr method to handle joining and truncation
  • [x] T32: Same as T31 but for dependencies
  • [x] T33: Build the compatilbity graph for each packages with the same name
    • [x] T33.1: Define criterion for merging packages
    • [x] T33.2: Compress graph using clique partition algorithm
    • [x] T33.3: Same as T14.3 but for set of packages
    • [x] T33.4: Same as T14.4 but for set of packages

Summary error message [POC-A]

  • [ ] T41: Create a function to return the summary message (as a std::string or std::sstream)
    • [ ] T41.1: TODO: define steps

Tree error message [POC-B]

  • [x] T51: Create an enum for the type of node (root, visited, leaf...)
  • [x] T52: Create a structure for node meta-information (depth, last in children, status)
  • [ ] T53a: Implement an edited Depth First Search (DFS) algorithm (nodes with same dep_id must be visited one after another)
    • [x] T53a.1: Dynamically add split nodes
    • [ ] T53a.2: Dynamically merge visited children in split nodes
  • [ ] T53b: Extend the DFS algorithm to have the felxibility required
    • [ ] T53b.1: Add node selection/ordering callback
  • [ ] T53c: Convert graph to hybrid graph for reusing the default DFS
  • [ ] T54: Create a function/struct/visitor to convert nodes into English messages
  • [ ] T55: Create a function to return the summary message (as a std::string or std::sstream)

Functional test

  • [x] T61: Reuse all tests case from AntoinePrv/mamba-error-reporting
    • [x] T61.1: Pin dependencies to avoid the tests failing in the future
    • [ ] T61.2: Ensure the error messages do not fail
    • [ ] T61.3: Check for certain keyword and packages names in the message

AntoinePrv avatar Aug 24 '22 13:08 AntoinePrv

I've updated the explanations of the algo and the structure of the classes from my PR if it helps: https://github.com/mamba-org/mamba/pull/1847#issue-1333365845

crogoz avatar Aug 25 '22 14:08 crogoz

Closing since message are already running with an experimental flag. Improvements can be tracked in separate issues.

AntoinePrv avatar Dec 06 '22 08:12 AntoinePrv