DL-Simplified icon indicating copy to clipboard operation
DL-Simplified copied to clipboard

Microsoft Malware Prediction

Open somaiaahmed opened this issue 1 year ago • 11 comments
trafficstars

🔴 Project Title: Microsoft Malware Prediction Challenge

🔴 Aim: Develop predictive models using data science techniques to anticipate malware attacks on machines, thereby preventing potential damage to Microsoft's vast user base.

🔴 Dataset: Utilize the unprecedented malware dataset provided by Microsoft to facilitate open-source advancements in malware prediction techniques.

🔴 Approach: Perform exploratory data analysis (EDA) on the malware dataset to understand its structure and characteristics. Implement 3-4 machine learning algorithms such as Random Forest, XGBoost, Neural Networks, and others. Compare these algorithms based on their performance metrics such as accuracy, precision, and recall to identify the most effective model for predicting malware occurrences.


📍 Follow the Guidelines to Contribute in the Project:

  • Create a separate folder named "Microsoft Malware Prediction" under the main repository.
  • Inside the "Microsoft Malware Prediction" folder, include the following components:
    • Images: For any necessary visualizations or diagrams related to EDA or model comparisons.
    • Dataset: Provide information about the malware dataset and its source.
    • Model: Implement machine learning models using the malware dataset.
    • requirements.txt: List required packages/libraries for project replication.
  • Inside the Model folder, ensure the README.md file is filled with visualizations, conclusions, and model performance details.

🔴🟡 Points to Note:

  • Issues are assigned on a first-come, first-serve basis; 1 Issue == 1 Pull Request (PR).
  • Issue Title and PR Title should be identical, including the issue number.
  • Follow Contributing Guidelines & Code of Conduct before starting to contribute.

To be Mentioned while taking the issue:

  • Full name: Somaia Ahmed
  • GitHub Profile Link: https://github.com/somaiaahmed
  • Email ID:[email protected]
  • Participant ID (if applicable): [NA or mention if applicable]
  • Approach for this Project: Perform EDA, implement Random Forest, XGBoost, Neural Networks, and other models, compare their performance using metrics like accuracy, precision, and recall.
  • What is your participant role?: GSSoC'24 | Contributor

Happy Contributing! 🚀

All the best. Enjoy your open source journey ahead. 😎

somaiaahmed avatar Jun 14 '24 00:06 somaiaahmed

Thank you for creating this issue! We'll look into it as soon as possible. Your contributions are highly appreciated! 😊

github-actions[bot] avatar Jun 14 '24 00:06 github-actions[bot]

@abhisheks008 , 👋 Hey bro can you please assign me this issue under GSSoC'24 with an appropriate level tag

somaiaahmed avatar Jun 14 '24 00:06 somaiaahmed

@abhisheks008 , kindly assign this isssue to me with an appropriate level tag

Nidhi-Satyapriya avatar Jun 15 '24 03:06 Nidhi-Satyapriya

@abhisheks008 , 👋 Hey bro can you please assign me this issue under GSSoC'24 with an appropriate level tag

What are the models you are planning for this problem statement? Mention at least 3-4 models for this dataset.

abhisheks008 avatar Jun 15 '24 07:06 abhisheks008

@abhisheks008 I'm planning to use Gradient Boosting Machines (GBM)

For tabular data like the one in this malware prediction challenge, tree-based ensemble methods (XGBoost, LightGBM, CatBoost) are often the most effective. These methods can handle the complexity and variability in the data well.

somaiaahmed avatar Jun 18 '24 11:06 somaiaahmed

@abhisheks008 I'm planning to use Gradient Boosting Machines (GBM)

For tabular data like the one in this malware prediction challenge, tree-based ensemble methods (XGBoost, LightGBM, CatBoost) are often the most effective. These methods can handle the complexity and variability in the data well.

Hi @somaiaahmed thanks for the approach. But this project repository demands deep learning models instead of machine learning models, hence can you please upgrade your approach and get back to this issue?

abhisheks008 avatar Jun 19 '24 05:06 abhisheks008

@abhisheks008 ok i can build CNN model plz assign it to me

somaiaahmed avatar Jun 19 '24 11:06 somaiaahmed

@abhisheks008 ok i can build CNN model plz assign it to me

Can you brief more on the planned the models? Only CNN will not work here as you need to implement at least 2-3 models for any project.

abhisheks008 avatar Jun 20 '24 06:06 abhisheks008

@abhisheks008, I can start working on it, after making sure you approve my solution for the Micromobility-Lane-Recognition Issue

Full name: Basma Mahmoud GitHub Profile Link: Basma2423 Email ID: [email protected]

Approach for this Project:

  1. Data Loading and Preprocessing
  2. EDA
  3. Models: 3.1 Multiple Deep Learning approaches suitable for tabular data, e.g: FNN, TabNet, and Entity Embeddings for Categorical Variables. 3.2 Maybe some pre-trained models, e.g. Pretrained TabNet, PyCaret, and AutoGluon.
  4. Models Assessment.

What is your participant role? (Mention the Open Source program): GSSoC-2024 participant

Can you add the label for GSSoC, please? Thanks.

Basma2423 avatar Jun 26 '24 12:06 Basma2423

@abhisheks008, I can start working on it, after making sure you approve my solution for the Micromobility-Lane-Recognition Issue

Full name: Basma Mahmoud GitHub Profile Link: Basma2423 Email ID: [email protected]

Approach for this Project:

  1. Data Loading and Preprocessing
  2. EDA
  3. Models: 3.1 Multiple Deep Learning approaches suitable for tabular data, e.g: FNN, TabNet, and Entity Embeddings for Categorical Variables. 3.2 Maybe some pre-trained models, e.g. Pretrained TabNet, PyCaret, and AutoGluon.
  4. Models Assessment.

What is your participant role? (Mention the Open Source program): GSSoC-2024 participant

Can you add the label for GSSoC, please? Thanks.

As this issue is raised by a contributor, I can't assign this to you

abhisheks008 avatar Jun 28 '24 16:06 abhisheks008

@abhisheks008 no probs.

Basma2423 avatar Jun 28 '24 21:06 Basma2423