activist
activist copied to clipboard
Malware scanning
Contributor checklist
- [x] This pull request is on a separate branch and not the main branch
- [x] I have run the tests for the backend and frontend depending on what's needed for my changes as described in the testing section of the contributing guide
Description
I have made the malware scanning functionality where each file is to be scanned before being stored in the database. I used a python library which is pyClamd. I created the malware_scan.py file inside the utils folder, i also had to set the pyclamd and its clamav deamon in settings.py, then other files that were edited are, serializers, docker file of backend, the docker compose file and the requirements file, and the requirements.txt.
Related issue
- #1171
Deploy Preview for activist-org canceled.
| Name | Link |
|---|---|
| Latest commit | fd6b4e9b6373300ca4773bc37bbfbf90b9f7987f |
| Latest deploy log | https://app.netlify.com/projects/activist-org/deploys/68a2e1549e6f0b0008ecb8ee |
Thank you for the pull request! ❤️
The activist team will do our best to address your contribution as soon as we can. If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Development rooms once you're in. Also consider attending our bi-weekly Saturday developer syncs! It'd be great to meet you 😊
Maintainer Checklist
The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)
-
[ ] The TypeScript, pytest and formatting workflows within the PR checks do not indicate new errors in the files changed
-
[ ] The Playwright end to end and Zap penetration tests have been ran and are passing (if necessary)
-
[ ] The changelog has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)
Hello @Joshua-Tihabwangye ! Thanks for all of the work getting this together; I'm looking forward to taking a look at the code!
Before we go further, there are a few things we should do, to get the PR in good shape. Could you please:
- In the 'Contributor Checklist' at the top of this page, check the item that confirms these commits are on a separate branch.
- Just below that, if you have tests for this code, please check the item that confirms you have run the tests.
- If there are no tests yet, no problem! A good place to start with that is a simple test, that checks if the scanner is up and running at the expected port. A second test would check that you can send a file to the scanner, plus check the response from the scanner. Testing can be a little overwhelming sometimes, so it's ok to start with a few basic tests that check if the tested code is running and runs as expected for a simple use case. We can always build more tests out later, but we need to start with these two basic tests.
- For the 'Description' field, please add a short description for what the code does. A high-level description that explains what the code is intended for, where it is in the codebase, and how it works should be enough. Simple and clear is good!
- For the 'Related Issue' please enter the original issue number that this PR addresses. I think you can just replace 'ISSUE_NUMBER' with the actual issue number.
I noticed that there are some conflicts that need to be resolved. If you can fix these, please go ahead to do so. If not, I'll take a look and see what we can do to fix them.
Overall, you built some code and created a PR! That is fantastic, and we really appreciate your efforts here. Our next step is to fix the items above, then we can get to the fun part of digging through the code!
Am sorry for not checking them, i got interrupted by other things and forgot but let me work on them right now
I need help, from where i have reached. If i run docker in the logs it will show that the database has been scanned and there is no problem in it, but when i access the admin dashboard and i upload the image it cant show i the filescan container any logs of showing that the image has been scanned or nay file has been scanned. so am stack there. am here that we work together as i learn from you.
Hi @Joshua-Tihabwangye ! You will certainly get help from us! We try our best to be helpful, and to promote a positive environment here!
I've made some initial comments about the code. DON'T STRESS!!! :) You're doing fine! We will go through all of the issues and fix them, one-by-one.
Right now, I have a question for @andrewtavis about how to architect this solution. It's actually a very basic question about where we want the code to go, and how we want to access it.
You've taken on a big task, and have made good progress towards solving it. We need to hear from Andrew, and then we can take next steps. Thanks for all of your work so far!
I duplicated it after getting problems with including the clamav modules into the requirements.txt file so, when I added in one file I thought of duplicating it to cover all the requirements.txt files
On Sun, May 18, 2025, 4:47 PM Matt Burnett @.***> wrote:
@.**** commented on this pull request.
In backend/utils/malware_scan.py https://github.com/activist-org/activist/pull/1267#discussion_r2094531553 :
@@ -0,0 +1,52 @@ +import pyclamd
This is a good start! Is it possible to move this code out of the main backend codebase, and instead treat it like a microservice? Waiting to hear from @andrewtavis https://github.com/andrewtavis to get his thoughts about this.
— Reply to this email directly, view it on GitHub https://github.com/activist-org/activist/pull/1267#pullrequestreview-2848992782, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCHYFTDO6TLQ6XPH34Z6FYL27CFPXAVCNFSM6AAAAAB5JE2SZCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNBYHE4TENZYGI . You are receiving this because you were mentioned.Message ID: @.***>
We can use the small micro service to implement the malware scanning functionality. I have made some research on both options of integrating the functionality into the API end points especially in serializers as it is right now and then I have also the one of creating a small microservice and I've found out that it is a good idea to make the functionality as a small microservice. It is good because it handles much data at one time it's is important to separate it ,in case of any problem like crashing of the Clamav it can't affect the backend code.
I duplicated it after getting problems with including the clamav modules into the requirements.txt file so, when I added in one file I thought of duplicating it to cover all the requirements.txt files
No stress, @Joshua-Tihabwangye! I can run the scripts to generate the dependencies in the final review here. We're good for now 😊
We can use the small micro service to implement the malware scanning functionality. I have made some research on both options of integrating the functionality into the API end points especially in serializers as it is right now and then I have also the one of creating a small microservice and I've found out that it is a good idea to make the functionality as a small microservice. It is good because it handles much data at one time it's is important to separate it ,in case of any problem like crashing of the Clamav it can't affect the backend code.
Let's do it then! 🚀 I think this will be a very nice first microservice to add into the codebase :)
I have made some research on how the microservice would look like inside our main repo or project.
In the microservice we shall have the following directories which will have different files.
- .env file = environment variables
- Docker file = Dependencies for running the container.
- Requirements.txt = will have the pyrhon dependencies of the API and the library responsible for scanning.
- App
- Tests
- Scripts Readme.md file.
Then we need to have an API to help is integrate it or connect it to the main project.
The API wil have the; * File Scan Endpoint * POST as the method
@mattburnett-repo , @to-sta @andrewtavis I need your idea and help on what my findings are, is there other things needed to fit in the structure or architecture of the microservice?
I have made some research on how the microservice would look like inside our main repo or project.
In the microservice we shall have the following directories which will have different files.
- .env file = environment variables
- Docker file = Dependencies for running the container.
- Requirements.txt = will have the pyrhon dependencies of the API and the library responsible for scanning.
- App
- Tests
- Scripts Readme.md file.
Then we need to have an API to help is integrate it or connect it to the main project.
The API wil have the; * File Scan Endpoint * POST as the method
This is a good overview and a good first step towards building the filescan functionality!
I would suggest that we do the following:
- Revert the
.env.devfile. The changes made to this file (the " characters surrounding the values) are not necessary in almost all cases. This change should be removed from the PR. - Revert the
docker-compose.ymlfile. The formatting of the codepython manage.py populate_db --users 10 --orgs-per-user 1...onto a single line is not necessary. We should keep the original formatting, instead. This change should be removed from the PR. - We can start to build the filescan module with the following steps, focusing first on the infrastructure:
- Create a separate module outside of the
backendfolder. We can start by creating a new folder calledservicesthat is on the same level as thebackendandfrontendfolders. - In this
servicesfolder, create another folder calledfilescan. In this folder, we can build the actual filescan module. We will build this module later, after we create the infrastructure for it. - In the
filescanfolder, create a small, simple http service that- Accepts only a POST request. This POST request should include a file to be scanned.
- Creates three Responses:
- First response is for the case where the file is ok.
- Second response is for the case where the file not ok.
- Third response is for the cases where an error of some type occurred.
- For now, this simple service doesn't need to actually do the filescan. For now, we just want to get an endpoint for the service created. We can add the filescan functionality after the endpoint is set up and accessible by the backend.
- To build this simple service, look at Flask as a possible tool to use. Flask allows you to build simple http-based endpoints. If you prefer to use something else, feel free to do so.
- Create a Dockerfile for this simple service.
- Create three tests, one for each of the three Responses mentioned above (ok, malware, error).
- Create a separate module outside of the
We don't need to focus on the actual filescanning right now. What is important at this step is to create the 'home' for the filescan functionality, meaning the folders/http service/Dockerfile/tests. Once we have this 'home' created and working well, we can add the real filescan code.
@andrewtavis @to-sta , as always we're interested in any thoughts or comments you might have!
I think the general plan sounds really good, @mattburnett-repo :) Thanks for outlining it in such detail 😊
@Joshua-Tihabwangye , I've outlined some steps, above, to take in order to move forward with this task. Please take a look; they should be helpful.
Thank you @mattburnett-repo for the highlights given to us. It will really help me and the development team in this task
Good morning @mattburnett-repo , @andrewtavis , @to-sta , it has been long when i am off i got so ill, but now i am okay it is time i embark on the project with your help.
Today i will dedicate it to make a pilot study on how to start the other form of implementing the file scanning functionality then on friday i will give you the result here, and you correct me or give me a go head.
Thanks for checking in, @Joshua-Tihabwangye! Maybe for a star you can send along a commit to fix the merge conflicts? From there we can discuss what the next steps should be here. I think that generally we were discussing a Fast API microservice for this, if not in this PR than in the issue or in other discussions around it :)
Okay, thank you @andrewtavis for the highlight on what to figure out first, I will first revert some changes that I had made and clear the merge conflicts
hello @andrewtavis i corrected the merge conflicts but there is one check that failed to pass i have been on it for 2 days and i have failed so i need your help
Hey @Joshua-Tihabwangye 👋 Thanks for your continued efforts here. Could we ask that you first get to some of the backend tests like the ruff and mypy errors. There also are some unaddressed changes like this one here directing you to revert some changes. Would be great to get this work done, and then from there we can get into a deeper review!
hello @andrewtavis is it fine now ? check on my updates on the checks that had refused before
@momanyisamuel, would you have a moment to loop into the conversation here and support? Happy to discuss in a call (messaged you about this on Signal 😊). As this is the first micro service and also security related, it'd be great to get your support here!
@andrewtavis and @to-sta and @mattburnett-repo are we continuing with the plan of the small micro service or ?
@Joshua-Tihabwangye, plan is still to do the micro service here. We'll discuss a bit and report back to you :) Thanks for your patience!
Okay @andrewtavis thank you.