activist icon indicating copy to clipboard operation
activist copied to clipboard

Malware scanning

Open Joshua-Tihabwangye opened this issue 6 months ago • 27 comments

Contributor checklist

  • [x] This pull request is on a separate branch and not the main branch
  • [x] I have run the tests for the backend and frontend depending on what's needed for my changes as described in the testing section of the contributing guide

Description

I have made the malware scanning functionality where each file is to be scanned before being stored in the database. I used a python library which is pyClamd. I created the malware_scan.py file inside the utils folder, i also had to set the pyclamd and its clamav deamon in settings.py, then other files that were edited are, serializers, docker file of backend, the docker compose file and the requirements file, and the requirements.txt.

Related issue

  • #1171

Joshua-Tihabwangye avatar May 16 '25 16:05 Joshua-Tihabwangye

Deploy Preview for activist-org canceled.

Name Link
Latest commit fd6b4e9b6373300ca4773bc37bbfbf90b9f7987f
Latest deploy log https://app.netlify.com/projects/activist-org/deploys/68a2e1549e6f0b0008ecb8ee

netlify[bot] avatar May 16 '25 16:05 netlify[bot]

Thank you for the pull request! ❤️

The activist team will do our best to address your contribution as soon as we can. If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Development rooms once you're in. Also consider attending our bi-weekly Saturday developer syncs! It'd be great to meet you 😊

github-actions[bot] avatar May 16 '25 16:05 github-actions[bot]

Maintainer Checklist

The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

  • [ ] The TypeScript, pytest and formatting workflows within the PR checks do not indicate new errors in the files changed

  • [ ] The Playwright end to end and Zap penetration tests have been ran and are passing (if necessary)

  • [ ] The changelog has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

github-actions[bot] avatar May 16 '25 16:05 github-actions[bot]

Hello @Joshua-Tihabwangye ! Thanks for all of the work getting this together; I'm looking forward to taking a look at the code!

Before we go further, there are a few things we should do, to get the PR in good shape. Could you please:

  • In the 'Contributor Checklist' at the top of this page, check the item that confirms these commits are on a separate branch.
  • Just below that, if you have tests for this code, please check the item that confirms you have run the tests.
    • If there are no tests yet, no problem! A good place to start with that is a simple test, that checks if the scanner is up and running at the expected port. A second test would check that you can send a file to the scanner, plus check the response from the scanner. Testing can be a little overwhelming sometimes, so it's ok to start with a few basic tests that check if the tested code is running and runs as expected for a simple use case. We can always build more tests out later, but we need to start with these two basic tests.
  • For the 'Description' field, please add a short description for what the code does. A high-level description that explains what the code is intended for, where it is in the codebase, and how it works should be enough. Simple and clear is good!
  • For the 'Related Issue' please enter the original issue number that this PR addresses. I think you can just replace 'ISSUE_NUMBER' with the actual issue number.

I noticed that there are some conflicts that need to be resolved. If you can fix these, please go ahead to do so. If not, I'll take a look and see what we can do to fix them.

Overall, you built some code and created a PR! That is fantastic, and we really appreciate your efforts here. Our next step is to fix the items above, then we can get to the fun part of digging through the code!

mattburnett-repo avatar May 17 '25 13:05 mattburnett-repo

Am sorry for not checking them, i got interrupted by other things and forgot but let me work on them right now

Joshua-Tihabwangye avatar May 17 '25 18:05 Joshua-Tihabwangye

I need help, from where i have reached. If i run docker in the logs it will show that the database has been scanned and there is no problem in it, but when i access the admin dashboard and i upload the image it cant show i the filescan container any logs of showing that the image has been scanned or nay file has been scanned. so am stack there. am here that we work together as i learn from you.

Joshua-Tihabwangye avatar May 17 '25 19:05 Joshua-Tihabwangye

Hi @Joshua-Tihabwangye ! You will certainly get help from us! We try our best to be helpful, and to promote a positive environment here!

I've made some initial comments about the code. DON'T STRESS!!! :) You're doing fine! We will go through all of the issues and fix them, one-by-one.

Right now, I have a question for @andrewtavis about how to architect this solution. It's actually a very basic question about where we want the code to go, and how we want to access it.

You've taken on a big task, and have made good progress towards solving it. We need to hear from Andrew, and then we can take next steps. Thanks for all of your work so far!

mattburnett-repo avatar May 18 '25 13:05 mattburnett-repo

I duplicated it after getting problems with including the clamav modules into the requirements.txt file so, when I added in one file I thought of duplicating it to cover all the requirements.txt files

On Sun, May 18, 2025, 4:47 PM Matt Burnett @.***> wrote:

@.**** commented on this pull request.

In backend/utils/malware_scan.py https://github.com/activist-org/activist/pull/1267#discussion_r2094531553 :

@@ -0,0 +1,52 @@ +import pyclamd

This is a good start! Is it possible to move this code out of the main backend codebase, and instead treat it like a microservice? Waiting to hear from @andrewtavis https://github.com/andrewtavis to get his thoughts about this.

— Reply to this email directly, view it on GitHub https://github.com/activist-org/activist/pull/1267#pullrequestreview-2848992782, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCHYFTDO6TLQ6XPH34Z6FYL27CFPXAVCNFSM6AAAAAB5JE2SZCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNBYHE4TENZYGI . You are receiving this because you were mentioned.Message ID: @.***>

Joshua-Tihabwangye avatar May 18 '25 17:05 Joshua-Tihabwangye

We can use the small micro service to implement the malware scanning functionality. I have made some research on both options of integrating the functionality into the API end points especially in serializers as it is right now and then I have also the one of creating a small microservice and I've found out that it is a good idea to make the functionality as a small microservice. It is good because it handles much data at one time it's is important to separate it ,in case of any problem like crashing of the Clamav it can't affect the backend code.

Joshua-Tihabwangye avatar May 18 '25 18:05 Joshua-Tihabwangye

I duplicated it after getting problems with including the clamav modules into the requirements.txt file so, when I added in one file I thought of duplicating it to cover all the requirements.txt files

No stress, @Joshua-Tihabwangye! I can run the scripts to generate the dependencies in the final review here. We're good for now 😊

We can use the small micro service to implement the malware scanning functionality. I have made some research on both options of integrating the functionality into the API end points especially in serializers as it is right now and then I have also the one of creating a small microservice and I've found out that it is a good idea to make the functionality as a small microservice. It is good because it handles much data at one time it's is important to separate it ,in case of any problem like crashing of the Clamav it can't affect the backend code.

Let's do it then! 🚀 I think this will be a very nice first microservice to add into the codebase :)

andrewtavis avatar May 18 '25 18:05 andrewtavis

I have made some research on how the microservice would look like inside our main repo or project.

In the microservice we shall have the following directories which will have different files.

  1. .env file = environment variables
  2. Docker file = Dependencies for running the container.
  3. Requirements.txt = will have the pyrhon dependencies of the API and the library responsible for scanning.
  4. App
  5. Tests
  6. Scripts Readme.md file.

Then we need to have an API to help is integrate it or connect it to the main project.

The API wil have the; * File Scan Endpoint * POST as the method

Joshua-Tihabwangye avatar May 23 '25 07:05 Joshua-Tihabwangye

@mattburnett-repo , @to-sta @andrewtavis I need your idea and help on what my findings are, is there other things needed to fit in the structure or architecture of the microservice?

Joshua-Tihabwangye avatar May 23 '25 07:05 Joshua-Tihabwangye

I have made some research on how the microservice would look like inside our main repo or project.

In the microservice we shall have the following directories which will have different files.

  1. .env file = environment variables
  2. Docker file = Dependencies for running the container.
  3. Requirements.txt = will have the pyrhon dependencies of the API and the library responsible for scanning.
  4. App
  5. Tests
  6. Scripts Readme.md file.

Then we need to have an API to help is integrate it or connect it to the main project.

The API wil have the; * File Scan Endpoint * POST as the method

This is a good overview and a good first step towards building the filescan functionality!

I would suggest that we do the following:

  • Revert the .env.dev file. The changes made to this file (the " characters surrounding the values) are not necessary in almost all cases. This change should be removed from the PR.
  • Revert the docker-compose.yml file. The formatting of the code python manage.py populate_db --users 10 --orgs-per-user 1... onto a single line is not necessary. We should keep the original formatting, instead. This change should be removed from the PR.
  • We can start to build the filescan module with the following steps, focusing first on the infrastructure:
    • Create a separate module outside of the backend folder. We can start by creating a new folder called services that is on the same level as the backend and frontend folders.
    • In this services folder, create another folder called filescan. In this folder, we can build the actual filescan module. We will build this module later, after we create the infrastructure for it.
    • In the filescan folder, create a small, simple http service that
      • Accepts only a POST request. This POST request should include a file to be scanned.
      • Creates three Responses:
        • First response is for the case where the file is ok.
        • Second response is for the case where the file not ok.
        • Third response is for the cases where an error of some type occurred.
      • For now, this simple service doesn't need to actually do the filescan. For now, we just want to get an endpoint for the service created. We can add the filescan functionality after the endpoint is set up and accessible by the backend.
      • To build this simple service, look at Flask as a possible tool to use. Flask allows you to build simple http-based endpoints. If you prefer to use something else, feel free to do so.
    • Create a Dockerfile for this simple service.
    • Create three tests, one for each of the three Responses mentioned above (ok, malware, error).

We don't need to focus on the actual filescanning right now. What is important at this step is to create the 'home' for the filescan functionality, meaning the folders/http service/Dockerfile/tests. Once we have this 'home' created and working well, we can add the real filescan code.

@andrewtavis @to-sta , as always we're interested in any thoughts or comments you might have!

mattburnett-repo avatar May 23 '25 12:05 mattburnett-repo

I think the general plan sounds really good, @mattburnett-repo :) Thanks for outlining it in such detail 😊

andrewtavis avatar May 23 '25 16:05 andrewtavis

@Joshua-Tihabwangye , I've outlined some steps, above, to take in order to move forward with this task. Please take a look; they should be helpful.

mattburnett-repo avatar May 23 '25 16:05 mattburnett-repo

Thank you @mattburnett-repo for the highlights given to us. It will really help me and the development team in this task

Joshua-Tihabwangye avatar May 23 '25 16:05 Joshua-Tihabwangye

Good morning @mattburnett-repo , @andrewtavis , @to-sta , it has been long when i am off i got so ill, but now i am okay it is time i embark on the project with your help.

Joshua-Tihabwangye avatar Jul 10 '25 22:07 Joshua-Tihabwangye

Today i will dedicate it to make a pilot study on how to start the other form of implementing the file scanning functionality then on friday i will give you the result here, and you correct me or give me a go head.

Joshua-Tihabwangye avatar Jul 10 '25 22:07 Joshua-Tihabwangye

Thanks for checking in, @Joshua-Tihabwangye! Maybe for a star you can send along a commit to fix the merge conflicts? From there we can discuss what the next steps should be here. I think that generally we were discussing a Fast API microservice for this, if not in this PR than in the issue or in other discussions around it :)

andrewtavis avatar Jul 12 '25 13:07 andrewtavis

Okay, thank you @andrewtavis for the highlight on what to figure out first, I will first revert some changes that I had made and clear the merge conflicts

Joshua-Tihabwangye avatar Jul 12 '25 23:07 Joshua-Tihabwangye

hello @andrewtavis i corrected the merge conflicts but there is one check that failed to pass i have been on it for 2 days and i have failed so i need your help

Joshua-Tihabwangye avatar Jul 19 '25 20:07 Joshua-Tihabwangye

Hey @Joshua-Tihabwangye 👋 Thanks for your continued efforts here. Could we ask that you first get to some of the backend tests like the ruff and mypy errors. There also are some unaddressed changes like this one here directing you to revert some changes. Would be great to get this work done, and then from there we can get into a deeper review!

andrewtavis avatar Jul 19 '25 21:07 andrewtavis

hello @andrewtavis is it fine now ? check on my updates on the checks that had refused before

Joshua-Tihabwangye avatar Aug 18 '25 08:08 Joshua-Tihabwangye

@momanyisamuel, would you have a moment to loop into the conversation here and support? Happy to discuss in a call (messaged you about this on Signal 😊). As this is the first micro service and also security related, it'd be great to get your support here!

andrewtavis avatar Aug 18 '25 12:08 andrewtavis

@andrewtavis and @to-sta and @mattburnett-repo are we continuing with the plan of the small micro service or ?

Joshua-Tihabwangye avatar Aug 18 '25 13:08 Joshua-Tihabwangye

@Joshua-Tihabwangye, plan is still to do the micro service here. We'll discuss a bit and report back to you :) Thanks for your patience!

andrewtavis avatar Aug 18 '25 16:08 andrewtavis

Okay @andrewtavis thank you.

Joshua-Tihabwangye avatar Aug 18 '25 16:08 Joshua-Tihabwangye