trivy icon indicating copy to clipboard operation
trivy copied to clipboard

fix dpkg symlink license problem for image scan

Open albe19029 opened this issue 10 months ago • 15 comments

This commit fix empty licensed for packages with symlinks to copyright to other packages. To work correct next code also must be added to nested.Nested:

nested.txt

albe19029 avatar Feb 22 '25 21:02 albe19029

Also what will we do with https://github.com/knqyf263/nested. I have opened PR, even with tests. But don't sure that someone maintain this package. So will we extend it in trivy code? https://github.com/knqyf263/nested/pull/2/files

albe19029 avatar Feb 24 '25 12:02 albe19029

Also what will we do with https://github.com/knqyf263/nested. I have opened PR, even with tests. But don't sure that someone maintain this package. So will we extend it in trivy code?

We first need to decide that these changes are really necessary. If this is necessary, we will resolve the issue of merging this PR

DmitriyLewen avatar Feb 24 '25 12:02 DmitriyLewen

Well, when we delete directory - it is very useful to get a list exact items that will be deleted (in my case to clean up cache and remove dependent symlinks files)

albe19029 avatar Feb 24 '25 12:02 albe19029

So one last question: what we will do next? Will someone pick idea from current PR, and will try to support other analyzers and scan types (filesystem, vm)? Or do we need to support also other analyzers and scan types and update PR?

Just to understand is it important for you or not? Or it is just more our problem and we should handle it somehow?

albe19029 avatar Feb 24 '25 12:02 albe19029

this is our problem too. However, as I wrote before, we have to take many factors into account when releasing fixes.

I will return to this PR when I have time to study it more closely and maybe your changes will push me to another idea.

DmitriyLewen avatar Feb 24 '25 12:02 DmitriyLewen

I hope this idea will help solve the problem. And if not, you will let us know the reasons and perhaps we can come up with something else.

albe19029 avatar Feb 24 '25 12:02 albe19029

@DmitriyLewen any news? Did you manage to take a look to PR?

albe19029 avatar Feb 28 '25 12:02 albe19029

Hello @albe19029 Please be patient. The team is currently busy with the upcoming release, as well as other higher priority tasks.

DmitriyLewen avatar Mar 03 '25 03:03 DmitriyLewen

Sorry if I disturb you - any news about PR. Did someone managed to take a look at it? Is there any conclusion - is idea is working for you and if not why?

albe19029 avatar Mar 12 '25 09:03 albe19029

hello @albe19029 Sorry, I don't have time to look at your PR yet

DmitriyLewen avatar Mar 21 '25 10:03 DmitriyLewen

Good day, is there any progress? Did you manage to take a look at code of proposal?

albe19029 avatar May 21 '25 05:05 albe19029

Please excuse the wait.

Today I also took the time to review your PR and tried to implement some of my own ideas regarding symlinks. Unfortunately, it didn’t yield any results.

Regarding your PR - I still believe that each individual analyzer should not be responsible for handling file-system bypass logic. This violates OOP principles, so I think we shouldn’t adopt that approach. analyzer should get only filename and context of file.

I wasn’t able to implement my own ideas either. For the most part, everything hit the fact that Trivy treats layers separately, so if a symlink and its target file are created in different layers, there are problems handling that case.

The only option I see is to apply the post-analyzer logic to all analyzers. What I mean is that we would keep all files in memory and also process them across layers (adding, updating, deleting). That way we could search for links, replace them, and handle the files accordingly. However, this solution simply doesn’t fit into Trivy’s current implementation. We’d have to rewrite the entire fanal package, which we can’t afford to do right now.

DmitriyLewen avatar May 23 '25 10:05 DmitriyLewen

And if instead of new method we will produce field? It is like in child class we produce symlinks information, and in base class implement gathering logic.

albe19029 avatar May 23 '25 13:05 albe19029

The problem with gathering all files in memory is that you don’t see symlink files, as they are skipped on tar or directory iteration stage

albe19029 avatar May 23 '25 13:05 albe19029

And if instead of new method we will produce field? It is like in child class we produce symlinks information, and in base class implement gathering logic.

I am not sure that understand you. Can you share more info.

The problem with gathering all files in memory is that you don’t see symlink files, as they are skipped on tar or directory iteration stage

this is not the main problem. we can disable skipping links. But how to read the main file, process it with a link, and also transfer all this between layers and not load all the system resources - this is the main question.

DmitriyLewen avatar Jun 02 '25 07:06 DmitriyLewen

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar Aug 02 '25 00:08 github-actions[bot]