PyVerse icon indicating copy to clipboard operation
PyVerse copied to clipboard

Added PDF malware detection pipeline

Open DarshAgrawal14 opened this issue 1 year ago • 4 comments

Pull Request for PyVerse 💡

Requesting to submit a pull request to the PyVerse repository.


Issue Title

Please enter the title of the issue related to your pull request.
Pipeline for Detecting whether given PDF is malicious or not

  • [x] I have provided the issue title.

Info about the Related Issue

What's the goal of the project?
Describe the aim of the project.

  • [ ] I have described the aim of the project.

Name

Please mention your name.
Darsh Agrawal

  • [x] I have provided my name.

GitHub ID

Please mention your GitHub ID.
DarshAgrawal14

  • [x] I have provided my GitHub ID.

Email ID

Please mention your email ID for further communication.
[email protected]

  • [x] I have provided my email ID.

Identify Yourself

Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).
GSSOC

  • [x] I have mentioned my participant role.

Closes

Enter the issue number that will be closed through this PR.
Closes: #569

  • [x] I have provided the issue number.

Describe the Add-ons or Changes You've Made

Give a clear description of what you have added or modified.
I have added a pipeline which detects whether the given pdf contains malware or not. It extracts features from given pdf such as meta-data , images , links , content etc and analyses it in order to detect malware or malicious content.

https://github.com/user-attachments/assets/4908ba06-627a-43ba-ac21-7ea14ccf632f

  • [x] I have described my changes.

Type of Change

Select the type of change:

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Code style update (formatting, local variables)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update

How Has This Been Tested?

Describe how your changes have been tested.
Describe your testing process here. Run the predict.py file along with path to pdf file and model will predict whether the given pdf is malicious or not

  • [x] I have described my testing process.

Checklist

Please confirm the following:

  • [x] My code follows the guidelines of this project.
  • [x] I have performed a self-review of my own code.
  • [x] I have commented my code, particularly wherever it was hard to understand.
  • [x] I have made corresponding changes to the documentation.
  • [x] My changes generate no new warnings.
  • [x] I have added things that prove my fix is effective or that my feature works.
  • [x] Any dependent changes have been merged and published in downstream modules.

DarshAgrawal14 avatar Oct 14 '24 03:10 DarshAgrawal14

👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!

Feel free to join our community on Discord to discuss more!

github-actions[bot] avatar Oct 14 '24 03:10 github-actions[bot]

@UTSAVS26, the assigned level is 1. However, along with model creation, I also developed a script to extract features from the given PDFs and process them for model input, which was quite time-consuming. Therefore, I believe this task aligns more appropriately with a level 2 or above classification. I would appreciate your consideration of this adjustment.

DarshAgrawal14 avatar Oct 14 '24 03:10 DarshAgrawal14

@UTSAVS26, the assigned level is 1. However, along with model creation, I also developed a script to extract features from the given PDFs and process them for model input, which was quite time-consuming. Therefore, I believe this task aligns more appropriately with a level 2 or above classification. I would appreciate your consideration of this adjustment.

Yes I know that I added the level to 1 as you are just adding a project but if you fix any bug or try to add new feature in the existing codebase that will be counted in higher levels. Hope you understand.

I have shared all the guidelines for contributors in the discussion page.

UTSAVS26 avatar Oct 14 '24 09:10 UTSAVS26

@UTSAVS26 , It would be helpful if you could provide specific examples of what you consider a bug fix or feature addition. From what I noticed, the PRs marked as Level 2 also seem to be project additions. Looking forward to your thoughts!

DarshAgrawal14 avatar Oct 14 '24 09:10 DarshAgrawal14

@UTSAVS26 , It would be helpful if you could provide specific examples of what you consider a bug fix or feature addition. From what I noticed, the PRs marked as Level 2 also seem to be project additions. Looking forward to your thoughts!

https://github.com/UTSAVS26/PyVerse/issues/621

Like this one he is fixing bug and as this is giving us more trouble that's why I have given it 3

UTSAVS26 avatar Oct 16 '24 02:10 UTSAVS26