PyVerse [Code Addition Request]: Credit Card Fraud Detection

Pull Request for PyVerse 💡

Requesting to submit a pull request to the PyVerse repository.

Issue Title

Please enter the title of the issue related to your pull request.
Enter the issue title here. [Code Addition Request]: Credit Card Fraud Detection

[x] I have provided the issue title.

Info about the Related Issue

What's the goal of the project?
Describe the aim of the project. This project aims to detect fraudulent credit card transactions using machine learning algorithms. The dataset, sourced from Kaggle, contains transaction data labeled as fraudulent or non-fraudulent. The goal is to build a model that can accurately classify transactions as fraudulent or not, despite the significant class imbalance (very few fraud cases compared to legitimate transactions).

[x] I have described the aim of the project.

Name

Please mention your name.
Enter your name here. Credit Card Fraud Detection System

[x] I have provided my name.

GitHub ID

Please mention your GitHub ID.
Enter your GitHub ID here. inkerton

[x] I have provided my GitHub ID.

Email ID

Please mention your email ID for further communication.
Enter your email ID here. [email protected]

[x] I have provided my email ID.

Identify Yourself

Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).
Enter your participant role here. GSSOC

[x] I have mentioned my participant role.

Closes

Enter the issue number that will be closed through this PR.
Closes: #issue-number #497

[x] I have provided the issue number.

Describe the Add-ons or Changes You've Made

Give a clear description of what you have added or modified.
Describe your changes here.

[ ] I have described my changes.

Type of Change

Select the type of change:

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Code style update (formatting, local variables)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

How Has This Been Tested?

Describe how your changes have been tested.
Describe your testing process here. The provided code appears to be tested through various data exploration, visualization, and feature analysis steps. Here's how the code has likely been tested:

Data Integrity and Format: The code first downloads the dataset using Kaggle API and loads it into a Pandas DataFrame. It checks the structure of the data using data_df.head() and data_df.describe(). These functions ensure that the data is properly loaded and that there are no immediate formatting issues.
Missing Values Check: It verifies whether there are missing values in the dataset by checking for null values using the isnull() function. This step helps in confirming the completeness of the dataset and identifying any need for data cleaning.
Class Distribution Check: The code checks the balance of the dataset by visualizing the distribution of fraud and non-fraud transactions using a bar plot. This helps test whether the dataset is imbalanced, a crucial factor for classification tasks.
Exploratory Data Analysis (EDA):
- Density Plot: The code generates time density plots for fraudulent and non-fraudulent transactions using Plotly. This helps understand the distribution of transactions over time, checking if fraud and non-fraud activities show different patterns.
- Hourly Transaction Analysis: The dataset is grouped by hour and transaction class, and various statistics like the total number of transactions, mean, and max amounts are calculated. This is tested using line plots, which help in visualizing temporal patterns of fraud versus non-fraud activity.
- Box Plots: Transaction amounts are visualized using box plots, both including and excluding outliers, to test for any significant differences in transaction amounts between fraud and non-fraud cases.
Correlation Check: The Pearson correlation matrix is calculated and plotted as a heatmap. This step tests for relationships between features, ensuring that the dataset has no multicollinearity issues that might affect model performance.
Scatter Plots: Several scatter plots are created to test how certain features (like V2, V5, V20, and Amount) relate to each other and whether there's any visible separation between fraudulent and non-fraudulent transactions.
Feature Density: The code tests the distribution of the features using density plots to confirm that the features have proper variation and that fraudulent transactions might have distinguishable characteristics from non-fraudulent ones.

[x] I have described my testing process.

Checklist

Please confirm the following:

[x] My code follows the guidelines of this project.
[x] I have performed a self-review of my own code.
[x] I have commented my code, particularly wherever it was hard to understand.
[x] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have added things that prove my fix is effective or that my feature works.
[ ] Any dependent changes have been merged and published in downstream modules.

Oct 14 '24 09:10 inkerton

👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!

Feel free to join our community on Discord to discuss more!

Oct 14 '24 09:10 github-actions[bot]

@UTSAVS26 Any Update on PR acceptance

Oct 24 '24 17:10 inkerton

@inkerton correct the .ipynb file name, as it will cause issue while cloning the repo.

Oct 25 '24 02:10 UTSAVS26

@UTSAVS26 Corrected

Oct 25 '24 04:10 inkerton

@UTSAVS26 Corrected

Till now it is not corrected.

Oct 25 '24 21:10 UTSAVS26

File name changed

Oct 26 '24 16:10 inkerton