ort icon indicating copy to clipboard operation
ort copied to clipboard

Analyzer: Remove projects that are also dependencies

Open bennati opened this issue 2 years ago • 17 comments

Local directories that are dependencies can appear in the analyzer results both as packages, e.g. Pub, and projects. After building the results, projects are removed if they match with packages on VCS information and package manager type.

This is caused by issue #5365. This is only a workaround which should be replaced by a comprehensive solution for #5365.

Please ensure that your pull request adheres to our contribution guidelines. Thank you!

bennati avatar May 23 '22 10:05 bennati

Codecov Report

Merging #5369 (796bc6c) into main (d7d323a) will not change coverage. The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #5369   +/-   ##
=========================================
  Coverage     72.06%   72.06%           
  Complexity     2106     2106           
=========================================
  Files           268      268           
  Lines         14376    14376           
  Branches       2201     2201           
=========================================
  Hits          10360    10360           
  Misses         2931     2931           
  Partials       1085     1085           
Flag Coverage Δ
test 32.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d7d323a...796bc6c. Read the comment docs.

codecov[bot] avatar May 23 '22 11:05 codecov[bot]

Hmm, I believe we should not start adding such kind of work-arounds on a per-package-manager basis. What do you think @mnonnenmacher, @MarcelBochtler?

sschuberth avatar May 23 '22 11:05 sschuberth

We have a project where ~20 packages are mistakenly identified as projects, I'm not even sure how to solve these issues using curations. A proper fix is likely too far away for our release schedule. Happy to evaluate alternatives.

P.s. This is meant as a temporary workaround, as specified in the commit message

bennati avatar May 23 '22 13:05 bennati

@sschuberth What do you mean by per-package-manager? This seems to be a generic workaround that affects all package managers where the described issue could appear.

@bennati I would rather know more details about the issue and fix the root cause instead of adding this workaround. Can you provide more details in which case projects also appear as packages? This should not happen in general. I also don't understand how this relates to #5365, as this PR is about project/package duplicates while the issue addresses the association of files to projects.

mnonnenmacher avatar May 24 '22 15:05 mnonnenmacher

@sschuberth What do you mean by per-package-manager?

I was mislead. At the first glance I somehow though the code would be Pub-specific, but it is not.

sschuberth avatar May 24 '22 17:05 sschuberth

@mnonnenmacher sure, let's take an example directory structure like the following:

project_root/
|--- lib_dependency/
|--- app/
       |--- pubspec.yaml

And pubspec.yaml contains the following

dependencies:
  here_lib_dependency:
    path: ../lib_dependency

The scanner will first identify lib_dependency as a Project when traversing the file system, then the same will be identified as Package when analyzing pubspec.yaml. The VCS info will be the same, but the name will differ, and this raises issues as the folder is in one case identified as HERE proprietary and in the other not.

Hope this helps

bennati avatar May 25 '22 07:05 bennati

The scanner will first identify lib_dependency as a Project when traversing the file system, then the same will be identified as Package when analyzing pubspec.yaml.

Does lib_dependencycontain a Gradle or CocoaPods definition file? Because it seems that the duplication is caused by an issue with the heuristic introduced in #4831 that associates Gradle and CocoaPods projects with Pub projects if the are contained in a subdirectory of the Pub project, but in this case lib_dependency is not a subdirectory of app. @MarcelBochtler Would you agree?

The VCS info will be the same, but the name will differ, and this raises issues as the folder is in one case identified as HERE proprietary and in the other not.

It depends on where the license information is coming from, but I don't see right now how this can happen. If the license is contained in the metadata of the package it should be found in both cases, and if it comes from a license file, as well.

mnonnenmacher avatar May 25 '22 13:05 mnonnenmacher

Let me revise my example:

project_root:
|--- app:
      |--- dependency:
             |--- pubspec.yaml
      |--- code:
             |--- pubspec.yaml

app/code/pubspec.yaml contains:

name: here_code

dependencies:
   here_dependency:
      path: ../dependency

app/dependency/pubspec.yaml contains:

name: here_dependency

The analyzer will complain that Resolving Pub dependencies for path 'app/code/pubspec.yaml' failed with: IOException: Could not find package info for here_dependency

Inspecting the code I noticed that the list of projects contains an entry called dependency while the list of packages contains and entry here_dependency

Hope this clarifies

bennati avatar May 30 '22 10:05 bennati

https://github.com/oss-review-toolkit/ort/commit/cea7f66ea43c4ca61cebbed2835d5681acd8d03a breaks this patch. I would like clarity on whether this contribution is actually wanted before spending time to adapt it.

bennati avatar Jun 17 '22 15:06 bennati

cea7f66 breaks this patch. I would like clarity on whether this contribution is actually wanted before spending time to adapt it.

@bennati If anything is removed it should be the duplicate packages, not projects. Because if a pubspec.yaml is in the file tree it is a project from the ORT perspective. But as mentioned above, I would prefer to fix the root cause instead of adding a workaround. As the problem seems to be that the Pub implementation does not recognize that the dependency is also a local project, the real fix should solve this somehow. I won't have time to look into this anytime soon, but if you want to work on this, my suggestion would be to add a similar project dependency to our synthetic pub test project first, to document the issue, and then find a way to detect project dependencies in the Pub implementation. The solution could use a similar approach as the Maven implementation, where all projects are detected in the beforeResolution() function to later resolve project dependencies without having to build the Maven project.

mnonnenmacher avatar Jun 30 '22 16:06 mnonnenmacher

@mnonnenmacher that sounds a good approach. Given that I won't have time to work on this anytime soon either, could we go for this approach and then revert it once the full fix is implemented?

bennati avatar Jun 30 '22 17:06 bennati

@mnonnenmacher that sounds a good approach. Given that I won't have time to work on this anytime soon either, could we go for this approach and then revert it once the full fix is implemented?

If we implement a workaround it should remove the duplicate packages, not projects. Note that a similar change was just done in #5509 for SpdxDocumentFile.

mnonnenmacher avatar Jul 01 '22 09:07 mnonnenmacher

The reason why I removed duplicate projects instead of duplicate packages is that the correct name for the dependency is specified in pubspec.yaml, while the name of the corresponding project is given by the path it's in.

bennati avatar Jul 04 '22 15:07 bennati

The reason why I removed duplicate projects instead of duplicate packages is that the correct name for the dependency is specified in pubspec.yaml, while the name of the corresponding project is given by the path it's in.

This only happens if there is no name defined in the pubspec.yaml. If the name is set correctly project and packages should have the same identifier. Please not that I have added the removal of project packages for Pub in the last commit of #5513 because I was facing a similar issue.

mnonnenmacher avatar Jul 04 '22 16:07 mnonnenmacher

If the name is set correctly project and packages should have the same identifier

The issue I was facing what that the name specified in pubspec.yml was correctly assigned to the package but not assigned the the project, as the project was generated from the directory structure, thus ignoring the contents of pubspec.yml.

I tested you fix in https://github.com/oss-review-toolkit/ort/pull/5513 and it does not solve the issues I am facing, as projects are named following the directory structure and not following pubspec.yml.

An example: Rule violation:

MISSING_HERE_ID_IN_PACKAGE_ID | Pub:core:core:0.0.1 | - | Found Pub:core:core:0.0.1. This may be a HERE package, but is not identified as such by its id.

Projects:

Pub:core:core:0.0.1 (app/core/core/pubspec.yaml)

But Pub:here_core:here_core:0.0.1 is found among the packages

bennati avatar Jul 05 '22 08:07 bennati

@bennati I'm not sure what you mean by "the project was generated from the directory structure", but if you mean that the project name is the name of the folder of the definition file, that means that the fallback in this line is used: https://github.com/oss-review-toolkit/ort/blob/1ca71eb3c2ab8b8379eb9f387b13d0b66e276b83/analyzer/src/main/kotlin/managers/Pub.kt#L445

If this is the case, it seems that the name is not properly set in the pubspec.yaml, or that the linked code needs to be improved.

FYI, I have moved the Pub related fixes to #5524 as I have found more issues.

mnonnenmacher avatar Jul 05 '22 16:07 mnonnenmacher

If this is the case, it seems that the name is not properly set in the pubspec.yaml, or that the linked code needs to be improved.

The code was indeed wrong which means that project names were never parsed correctly, I have added a fix to #5524.

mnonnenmacher avatar Jul 05 '22 17:07 mnonnenmacher

The code was indeed wrong which means that project names were never parsed correctly, I have added a fix to #5524.

@bennati is this PR still needed with the above-mentioned PR?

sschuberth avatar Dec 09 '22 10:12 sschuberth

No, we can close this

bennati avatar Dec 09 '22 11:12 bennati