Added PDF malware detection pipeline
Pull Request for PyVerse 💡
Requesting to submit a pull request to the PyVerse repository.
Issue Title
Please enter the title of the issue related to your pull request.
Pipeline for Detecting whether given PDF is malicious or not
- [x] I have provided the issue title.
Info about the Related Issue
What's the goal of the project?
Describe the aim of the project.
- [ ] I have described the aim of the project.
Name
Please mention your name.
Darsh Agrawal
- [x] I have provided my name.
GitHub ID
Please mention your GitHub ID.
DarshAgrawal14
- [x] I have provided my GitHub ID.
Email ID
Please mention your email ID for further communication.
[email protected]
- [x] I have provided my email ID.
Identify Yourself
Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).
GSSOC
- [x] I have mentioned my participant role.
Closes
Enter the issue number that will be closed through this PR.
Closes: #569
- [x] I have provided the issue number.
Describe the Add-ons or Changes You've Made
Give a clear description of what you have added or modified.
I have added a pipeline which detects whether the given pdf contains malware or not. It extracts features from given pdf such as meta-data , images , links , content etc and analyses it in order to detect malware or malicious content.
https://github.com/user-attachments/assets/4908ba06-627a-43ba-ac21-7ea14ccf632f
- [x] I have described my changes.
Type of Change
Select the type of change:
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Code style update (formatting, local variables)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
How Has This Been Tested?
Describe how your changes have been tested.
Describe your testing process here.
Run the predict.py file along with path to pdf file and model will predict whether the given pdf is malicious or not
- [x] I have described my testing process.
Checklist
Please confirm the following:
- [x] My code follows the guidelines of this project.
- [x] I have performed a self-review of my own code.
- [x] I have commented my code, particularly wherever it was hard to understand.
- [x] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have added things that prove my fix is effective or that my feature works.
- [x] Any dependent changes have been merged and published in downstream modules.
👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!
Feel free to join our community on Discord to discuss more!
@UTSAVS26, the assigned level is 1. However, along with model creation, I also developed a script to extract features from the given PDFs and process them for model input, which was quite time-consuming. Therefore, I believe this task aligns more appropriately with a level 2 or above classification. I would appreciate your consideration of this adjustment.
@UTSAVS26, the assigned level is 1. However, along with model creation, I also developed a script to extract features from the given PDFs and process them for model input, which was quite time-consuming. Therefore, I believe this task aligns more appropriately with a level 2 or above classification. I would appreciate your consideration of this adjustment.
Yes I know that I added the level to 1 as you are just adding a project but if you fix any bug or try to add new feature in the existing codebase that will be counted in higher levels. Hope you understand.
I have shared all the guidelines for contributors in the discussion page.
@UTSAVS26 , It would be helpful if you could provide specific examples of what you consider a bug fix or feature addition. From what I noticed, the PRs marked as Level 2 also seem to be project additions. Looking forward to your thoughts!
@UTSAVS26 , It would be helpful if you could provide specific examples of what you consider a bug fix or feature addition. From what I noticed, the PRs marked as Level 2 also seem to be project additions. Looking forward to your thoughts!
https://github.com/UTSAVS26/PyVerse/issues/621
Like this one he is fixing bug and as this is giving us more trouble that's why I have given it 3