cve-bin-tool icon indicating copy to clipboard operation
cve-bin-tool copied to clipboard

feat: added debian parser

Open joydeep049 opened this issue 1 year ago • 31 comments

closes #2917

joydeep049 avatar Nov 27 '23 19:11 joydeep049

Only review and Merging left @terriko @anthonyharrison

joydeep049 avatar Nov 28 '23 12:11 joydeep049

@crazytrain328 Can you please add some tests and include some sample data to demonstrate the parser working.

anthonyharrison avatar Nov 28 '23 13:11 anthonyharrison

@anthonyharrison Could you tell me how to do that? I have worked on adding fuzz testing to parsers before. Do i do the same here?

joydeep049 avatar Nov 29 '23 15:11 joydeep049

Codecov Report

Attention: Patch coverage is 57.42574% with 43 lines in your changes are missing coverage. Please review.

Project coverage is 80.35%. Comparing base (d6cbe40) to head (d0b260a). Report is 179 commits behind head on main.

Files Patch % Lines
cve_bin_tool/parsers/deb.py 55.29% 34 Missing and 4 partials :warning:
test/test_language_scanner.py 66.66% 4 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3543      +/-   ##
==========================================
+ Coverage   75.41%   80.35%   +4.93%     
==========================================
  Files         808      823      +15     
  Lines       11983    12799     +816     
  Branches     1598     1999     +401     
==========================================
+ Hits         9037    10284    +1247     
+ Misses       2593     2089     -504     
- Partials      353      426      +73     
Flag Coverage Δ
longtests 75.27% <53.46%> (-0.15%) :arrow_down:
win-longtests 78.54% <57.42%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Nov 29 '23 16:11 codecov-commenter

Hello @terriko I was busy with my end semester exams (finally they are over). Finally I'm free and I'll be able to contribute regularly. Thanx for the help on this.

joydeep049 avatar Dec 04 '23 09:12 joydeep049

I am supposed to get a list of the Deb_products to add in the script only after i run a test with DebParser using a test.deb file. I chose to use the test.deb file in the test/assets section. I am using the following code to test it

import sys
import os
sys.path.append('/home/joydeep/dev/cve-bin-tool')
from cve_bin_tool.parsers.deb import DebParser

from cve_bin_tool.cvedb import CVEDB
from cve_bin_tool.log import LOGGER

cve_db= CVEDB()
logger= LOGGER

file_path = os.path.join(os.getcwd(), 'test.deb')

deb_parser= DebParser(cve_db=cve_db,logger=logger)

deb_parser.run_checker(file_path)

I have brought the test.deb file into the same directory as my testing file. I think there is something wrong in the way im calling Logger. Can you help me @terriko @anthonyharrison ?

joydeep049 avatar Dec 05 '23 17:12 joydeep049

@crazytrain328

The test is OK in your local environment to prove the functionality.. However what we need is a test within the cve-bin-tool test environment where we can add it to the test suite.

The language parsers all have test files in the test/language_data directory. Can you add your test.deb file in this directory and then update the test_language_scanner file to add your test code. I suggest you add a new test test_debian_package which calls the scanner and then asserts that the results are as expected.

Can you confirm that the parser is doing more than is already covered in the extractor module and tested in the test_extractor file which explicitly has a a test for files with a .deb extension.

anthonyharrison avatar Dec 05 '23 18:12 anthonyharrison

@anthonyharrison This test is not working in my Local Environment. It executes but it does not give any output . Since all the outputs to the console in the run_checker() function is through the logger object, I thought the way in which Im using logger in my test code is wrong.

joydeep049 avatar Dec 05 '23 18:12 joydeep049

I tried to change a few things but my local test still gives no output. For a change, I set the logging level down to DEBUG, but that does not help.

My code for testing:

import sys
import os
import logging  # Import the logging module

sys.path.append('/home/joydeep/dev/cve-bin-tool')
from cve_bin_tool.parsers.deb import DebParser
from cve_bin_tool.cvedb import CVEDB
from cve_bin_tool.log import LOGGER

LOGGER.setLevel(logging.DEBUG)  

cve_db = CVEDB()
logger = LOGGER

file_path = os.path.join(os.getcwd(), 'test.deb')

deb_parser = DebParser(cve_db=cve_db, logger=logger)

deb_parser.run_checker(file_path)


Modified run_checker() function:

def run_checker(self, filename):
        """Process .deb control file with file existence check"""
        self.logger.debug(f"Scanning .deb control file: {filename}")

        # Check if the file exists
        if not os.path.exists(filename):
            self.logger.error(f"File not found: {filename}")
            return  # Exit the method if file doesn't exist

        try:
            with open(filename) as file:
                control_data = file.read()
            product, version = self.parse_control_file(control_data)
            if product and version:
                product_info = self.find_vendor(product, version)
                if product_info:
                    yield from product_info
            else:
                self.logger.debug(f"No product/version found in {filename}")
        except Exception as e:
            self.logger.error(f"Error processing file {filename}: {e}")

        self.logger.debug(f"Done scanning file: {filename}")

Im stuck! Please help @terriko @anthonyharrison.

joydeep049 avatar Dec 07 '23 17:12 joydeep049

@crazytrain328 Can you provide the test.deb file that you are using?

Tried to run the parser in my environment. The run_checker routine wan't being called. However if I call a different routine in the class it does get called so I suspected there was something wrong with the way run-checker is defined/being called.

I created the following routine

    def do_it(self, filename):
        print ("DO IT")
        try:
            print ("Read file")
            with open(filename) as file:
                control_data = file.read()
            print ("File read")
        except:
            print ("We have a problem")
        print ("DONE IT")

And called this instead of run_checker. This resulted in the exception being called when reading a .deb file (I used the test.deb file in the test/assets directory). Renaming this as run_checker, does result in the run_checker being called. So I think you need to work through the run_checker routine line by line to validate the operation. Using print statement rather than logging may also help.

anthonyharrison avatar Dec 07 '23 23:12 anthonyharrison

@anthonyharrison I am also using the test.deb in the test/assets directory. But I will go through the run_checker() definition once again.

joydeep049 avatar Dec 08 '23 08:12 joydeep049

Have you updated the parse.py file? This calls the appropriate parser when it finds a particular file e.g. requirements.txt will invoke the python parser.

On Fri, 8 Dec 2023, 08:59 Joydeep Tripathy, @.***> wrote:

@anthonyharrison https://github.com/anthonyharrison I am also using the test.deb in the test/assets directory. But I will go through the run_checker() definition once again.

— Reply to this email directly, view it on GitHub https://github.com/intel/cve-bin-tool/pull/3543#issuecomment-1846795700, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAID23RI566OV5RRK3ETEDYILJFNAVCNFSM6AAAAAA74R5N46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWG44TKNZQGA . You are receiving this because you were mentioned.Message ID: @.***>

anthonyharrison avatar Dec 08 '23 09:12 anthonyharrison

I was able to run the run_checker() function. But now I am getting an Exception : Unable to open database file.

I created my own .deb file and am trying to parse its control file with my DebParser. I tried to call the GoParser with the go.mod file as well and it is giving me the same error.(Database one) Heres the local testing code:

import os
import sys
import logging
sys.path.append('/home/joydeep/dev/cve-bin-tool')
from cve_bin_tool.parsers.deb import DebParser

from cve_bin_tool.cvedb import CVEDB
from cve_bin_tool.log import LOGGER

# Set logger to DEBUG level
LOGGER.setLevel(logging.DEBUG)

# Verify the test file path
file_path = '/home/joydeep/mypackage/DEBIAN/control'
if not os.path.exists(file_path):
    raise FileNotFoundError(f"The file {file_path} does not exist.")

# Instantiate the database and the parser
cve_db = CVEDB()
deb_parser = DebParser(cve_db=cve_db, logger=LOGGER)

# Run the parser
try:
    for info in deb_parser.run_checker(file_path):
        print(info)
except Exception as e:
    LOGGER.error(f"Exception occurred: {e}")

Is there something wrong with the way Im using CVEDB? @anthonyharrison @terriko

joydeep049 avatar Dec 10 '23 16:12 joydeep049

If I had to guess, the database problem is that your database hasn't been created or updated in a while. So you're making a new CVEDB() but you're not populating it.

To update your local db and make sure it's functioning: cve-bin-tool -u now main/test/csv/triage.csv

(The file doesn't matter, I just chose one from our test suite so you can see if the database is working in other code.)

And in your code you'd probably want to call cvedb.get_cvelist_if_stale() to do the equivalent update. That said, this is where using the existing pytest harness would help you a lot over writing a separate test, as we already have database setup code and stuff in the existing test/test_* files and when you run it all in github actions you have access to the cached database so you don't have to initialize it yourself. I'd strongly recommend that you move your test into pytest and use the existing framework before spending too long debugging this: you're going to have to do it eventually anyhow because we need all tests to run through there before this code can be merged, so might as well just learn to do it that way first instead of figuring it out twice.

terriko avatar Dec 12 '23 19:12 terriko

If I had to guess, the database problem is that your database hasn't been created or updated in a while. So you're making a new CVEDB() but you're not populating it.

To update your local db and make sure it's functioning: cve-bin-tool -u now main/test/csv/triage.csv

(The file doesn't matter, I just chose one from our test suite so you can see if the database is working in other code.)

And in your code you'd probably want to call cvedb.get_cvelist_if_stale() to do the equivalent update. That said, this is where using the existing pytest harness would help you a lot over writing a separate test, as we already have database setup code and stuff in the existing test/test_* files and when you run it all in github actions you have access to the cached database so you don't have to initialize it yourself. I'd strongly recommend that you move your test into pytest and use the existing framework before spending too long debugging this: you're going to have to do it eventually anyhow because we need all tests to run through there before this code can be merged, so might as well just learn to do it that way first instead of figuring it out twice.

How do I get the DEBIAN_PRODUCTS which i have to add in the test/test_language_scanner.py file, when i write the test using the existing pytest setup?

joydeep049 avatar Dec 13 '23 07:12 joydeep049

How do I get the DEBIAN_PRODUCTS which i have to add in the test/test_language_scanner.py file, when i write the test using the existing pytest setup?

Usually you'd make this manually (i.e. cut and paste the data that you used when you created the file).

For example, if you look at https://github.com/intel/cve-bin-tool/blob/main/test/language_data/requirements.txt and then at the PYTHON_PRODUCTS array in https://github.com/intel/cve-bin-tool/blob/main/test/test_language_scanner.py you'll see that the test is just a subset of what could have been detected from the file.

In your case, since a debian package often contains only one product, you may have an array that's just the one thing you put into the metadata of the file, so you could probably write something like

def test_python_package(self, filename: str) -> None:
   assert scanner.scan_file(filename) == "debian_package"

Although you'll have to account for it returning an array rather than a single string or whatever it actually does (sorry, I've got to run to a meeting so I don't have time to double-check the api myself, but you can probably figure it out from the other tests!)

terriko avatar Dec 13 '23 17:12 terriko

Oh, and if you want to run just your new test to see how it works on your system, you can use the -k option:

pytest -vv -k test_control.deb

should probably get you just the new piece you added so you don't have to wait for a whole file worth of tests (or the whole test suite!) to complete.

terriko avatar Dec 19 '23 21:12 terriko

Oh, and if you want to run just your new test to see how it works on your system, you can use the -k option:

pytest -vv -k test_control.deb

should probably get you just the new piece you added so you don't have to wait for a whole file worth of tests (or the whole test suite!) to complete.

All the products that my test_control.deb has I have listed in the DEBIAN_PRODUCTS list ..I also modified the debparser code to be able to extract the products and their versions more efficiently, but still it does not give me the desired output.

One thing I read about debian control files is that while the actual package has a .deb extension, the control file inside the package (which basically contains metadata about the debian package) is actually a text file (without extension). Should I write my tests to process a control.txt file?

joydeep049 avatar Dec 20 '23 07:12 joydeep049

Typically, you'd want to have the test process a .deb and find and parse the control.txt, so... both?

terriko avatar Jan 03 '24 21:01 terriko

So I've been trying to find ways on how to unpack Debian packages using python and so far haven't had any luck in that . The test.deb file in test/assets has a structure like: test.deb --->control.tar.xz ---> . ---> control --->usr/bin ... Need Ideas on how to proceed.

joydeep049 avatar Jan 08 '24 19:01 joydeep049

We have a deb extractor in extractor.py. I think it's called extract_file_deb or something equally obvious. You could probably just use that.

terriko avatar Jan 23 '24 19:01 terriko

Hello @terriko , @anthonyharrison @b31ngd3v @Rexbeast2 Finally I was able to make my code parse a debian package and bring out the contents of its control file. However, I wouldnt expect any cves to actualy be present since this is purely a test file. So, if I write the usual test in test_language_scanner , Im bound to get an assertion error. How do i solve this? Also , please help me with the issue in bandit linter as it says that tarfile library has high severity. I went through all the docs and the ways and still couldnt find a way to build a tarfile extractor without using that library. Even the internal library of python uses a function shutil.unpack_archive which in turn uses _extract_tarfile functions which uses tarfile library. What to do ?

joydeep049 avatar Feb 05 '24 08:02 joydeep049

Only review and merge left. Thanx @terriko (I want to remove some of the comments from the code but If i do it now it will get stuck in the CI due to failing tests. Will open another doc issue for that)

Eagerly Waiting for review :)

joydeep049 avatar Feb 21 '24 17:02 joydeep049

Just a heads up: this has a bunch of merge conflicts now and will need some work.

I'll get back to solving this as soon as we have the PURL generation for language parsers and their tests figured out. @terriko @anthonyharrison

joydeep049 avatar Apr 01 '24 17:04 joydeep049

Marking this as blocked so I don't look at it again until after 3.3 is out.

terriko avatar Apr 03 '24 21:04 terriko

Marking this as blocked so I don't look at it again until after 3.3 is out.

Sure! I did mention working on this issue as part of stretch goals in my GSOC project. Maybe I'll get to it in the community bonding period.

Btw, When will the 3.3 version be coming out? @terriko

joydeep049 avatar Apr 04 '24 04:04 joydeep049

Hello @terriko , Since the release is out I was thinking we can finally work on finishing this one. Or should this be prioritised after 3.3.1 release is out? (Merge conflicts have been resolved)

joydeep049 avatar Apr 20 '24 10:04 joydeep049

I've barely started with 3.3.1 planning so I expect this will get merged long before there, but it's going to be at least a few weeks. I'm severely backlogged on non-cve-bin-tool stuff at the moment and have to put my focus elsewhere.

terriko avatar Apr 22 '24 16:04 terriko

I've barely started with 3.3.1 planning so I expect this will get merged long before there, but it's going to be at least a few weeks. I'm severely backlogged on non-cve-bin-tool stuff at the moment and have to put my focus elsewhere.

Absolutely no problem at all! I think this one is almost ready to merge except that I would have to add a script which creates a temporary debian file to test the code itself. But I thought it may end up being more expensive than what we are doing now.

joydeep049 avatar Apr 22 '24 16:04 joydeep049

As for testing files: for now, let's got with just including the .deb in git. I'm going to have to deal with the OpenSSF's insistence on there not being binary files eventually but I think at the moment it's more important to me that we have a functional test if it's not super easy to just have a makefile for it or something.

terriko avatar Apr 22 '24 16:04 terriko