[Code Addition Request]: Credit Card Fraud Detection
Pull Request for PyVerse 💡
Requesting to submit a pull request to the PyVerse repository.
Issue Title
Please enter the title of the issue related to your pull request.
Enter the issue title here.
[Code Addition Request]: Credit Card Fraud Detection
- [x] I have provided the issue title.
Info about the Related Issue
What's the goal of the project?
Describe the aim of the project.
This project aims to detect fraudulent credit card transactions using machine learning algorithms. The dataset, sourced from Kaggle, contains transaction data labeled as fraudulent or non-fraudulent. The goal is to build a model that can accurately classify transactions as fraudulent or not, despite the significant class imbalance (very few fraud cases compared to legitimate transactions).
- [x] I have described the aim of the project.
Name
Please mention your name.
Enter your name here.
Credit Card Fraud Detection System
- [x] I have provided my name.
GitHub ID
Please mention your GitHub ID.
Enter your GitHub ID here.
inkerton
- [x] I have provided my GitHub ID.
Email ID
Please mention your email ID for further communication.
Enter your email ID here.
[email protected]
- [x] I have provided my email ID.
Identify Yourself
Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).
Enter your participant role here.
GSSOC
- [x] I have mentioned my participant role.
Closes
Enter the issue number that will be closed through this PR.
Closes: #issue-number #497
- [x] I have provided the issue number.
Describe the Add-ons or Changes You've Made
Give a clear description of what you have added or modified.
Describe your changes here.
- [ ] I have described my changes.
Type of Change
Select the type of change:
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Code style update (formatting, local variables)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
How Has This Been Tested?
Describe how your changes have been tested.
Describe your testing process here.
The provided code appears to be tested through various data exploration, visualization, and feature analysis steps. Here's how the code has likely been tested:
-
Data Integrity and Format: The code first downloads the dataset using Kaggle API and loads it into a Pandas DataFrame. It checks the structure of the data using
data_df.head()anddata_df.describe(). These functions ensure that the data is properly loaded and that there are no immediate formatting issues. -
Missing Values Check: It verifies whether there are missing values in the dataset by checking for null values using the
isnull()function. This step helps in confirming the completeness of the dataset and identifying any need for data cleaning. -
Class Distribution Check: The code checks the balance of the dataset by visualizing the distribution of fraud and non-fraud transactions using a bar plot. This helps test whether the dataset is imbalanced, a crucial factor for classification tasks.
-
Exploratory Data Analysis (EDA):
- Density Plot: The code generates time density plots for fraudulent and non-fraudulent transactions using Plotly. This helps understand the distribution of transactions over time, checking if fraud and non-fraud activities show different patterns.
- Hourly Transaction Analysis: The dataset is grouped by hour and transaction class, and various statistics like the total number of transactions, mean, and max amounts are calculated. This is tested using line plots, which help in visualizing temporal patterns of fraud versus non-fraud activity.
- Box Plots: Transaction amounts are visualized using box plots, both including and excluding outliers, to test for any significant differences in transaction amounts between fraud and non-fraud cases.
-
Correlation Check: The Pearson correlation matrix is calculated and plotted as a heatmap. This step tests for relationships between features, ensuring that the dataset has no multicollinearity issues that might affect model performance.
-
Scatter Plots: Several scatter plots are created to test how certain features (like
V2,V5,V20, andAmount) relate to each other and whether there's any visible separation between fraudulent and non-fraudulent transactions. -
Feature Density: The code tests the distribution of the features using density plots to confirm that the features have proper variation and that fraudulent transactions might have distinguishable characteristics from non-fraudulent ones.
- [x] I have described my testing process.
Checklist
Please confirm the following:
- [x] My code follows the guidelines of this project.
- [x] I have performed a self-review of my own code.
- [x] I have commented my code, particularly wherever it was hard to understand.
- [x] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have added things that prove my fix is effective or that my feature works.
- [ ] Any dependent changes have been merged and published in downstream modules.
👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!
Feel free to join our community on Discord to discuss more!
@UTSAVS26 Any Update on PR acceptance
@inkerton correct the .ipynb file name, as it will cause issue while cloning the repo.
@UTSAVS26 Corrected
@UTSAVS26 Corrected
Till now it is not corrected.
File name changed