ML-CaPsule
ML-CaPsule copied to clipboard
ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many re...
Master Machine learning
data:image/s3,"s3://crabby-images/c69a3/c69a3794913abea1b01fc54ac153613c81f768a4" alt=""
Description
Machine learning technique to analysis data that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. ### Importance of Machine Learning Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations. Machine learning has become a significant competitive differentiator for many companies.
π±Pre-requisites
- Python IDE : Install it by using this link python.org
- If you are new to python programming and want to have a fair knowledge before you start working on it, you can learn it in a simplified way through this website
Topics
Extracting Data
Extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy
- Web scrapping - Library used :->> Beautiful Soup , Which extract the data from web pages.
Visualization
Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. Python offers multiple great graphing libraries that come packed with lots of different features.
- Different types of libraries used to manipulate data in form of type of graphs and graphical representation :->> Seaborn , pandas , matplotlib etc.
Feature selection (Variable Selection)
the process of selecting a subset of relevant features for use in model.Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.
- Library used for feature selection commonly :->> scikit-learn
- Link - https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/
Basic concepts of statistic
A).Understand the Type of Analytics
-
Descriptive Analytics tells us what happened in the past and helps a business understand how it is performing by providing context to help stakeholders interpret information.
-
Diagnostic Analytics takes descriptive data a step further and helps you understand why something happened in the past.
-
Predictive Analytics predicts what is most likely to happen in the future and provides companies with actionable insights based on the information.
-
Prescriptive Analytics provides recommendations regarding actions that will take advantage of the predictions and guide the possible actions toward a solution
B). Probability
- Conditional Probability
- Independent Events
- Mutually Exclusive Events
- Bayesβ Theorem
C). Central Tendency
- Mean
- Mode
- varience
- Skewness
- Kurtosis:
- Standard Deviation
D). Variability
- Range: The difference between the highest and lowest value in the dataset.
- Percentiles β A measure that indicates the value below which a given percentage of observations in a group of observations falls.
- Quantilesβ Values that divide the number of data points into four more or less equal parts, or quarters.
- Interquartile Range (IQR)β A measure of statistical dispersion and variability based on dividing a data set into quartiles. IQR = Q3 β Q1
- Variance: The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean.
E). Relationship Between Variables
- Causality: Relationship between two events where one event is affected by the other.
- Covariance: A quantitative measure of the joint variability between two or more variables.
- Correlation: Measure the relationship between two variables and ranges from -1 to 1, the normalized version of covariance.
F). Probability Distribution
- Probability Mass Function (PMF): A function that gives the probability that a discrete random variable is exactly equal to some value.
- Probability Density Function (PDF): A function for continuous data where the value at any given sample can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.
- Cumulative Density Function (CDF): A function that gives the probability that a random variable is less than or equal to a certain value.
G). Hypothesis Testing and Statistical Significance
- Null and Alternative Hypothesis
- Interpretation
- Z-Test
- T-Test
- ANOVA (Analysis of Variance)
- Chi-Square Test
H). Regression
-
Linear Regression ** Assumptions of Linear Regression
- Linear Relationship - Multivariate Normality - No or Little Multicollinearity - No or Little Autocorrelation - Homoscedasticity
-
Multiple Linear Regression
Data Science
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.
Why is data science important?
In business, the goal of data science is to provide intelligence about consumers and campaigns and help companies create strong plans to engage their audience and sell their products.
Data scientists must rely on creative insights using big data, the large amounts of information collected through various collection processes, like data mining.
On an even more fundamental level, big data analytics can help brands understand the customers who ultimately help determine the long-term success of a business or initiative. In addition to targeting the right audience, data science can be used to help companies control the stories of their brands.
Because big data is a rapidly growing field, there are constantly new tools available, and those tools need experts who can quickly learn their applications. Data scientists can help companies create a business plan to achieve goals based on research and not just intuition.
Data science plays a very important role in security and fraud detection, because the massive amounts of information allow for drilling down to find slight irregularities in data that can expose weaknesses in security systems.It is a driving force between highly specialized user experiences created through personalization and customization. The analysis can be used to make customers feel seen and understood by a company.
What are the six major areas of data science?
The six major areas of data science include the following:
- Multidisciplinary investigations. Considering large, complex systems with interconnected pieces, data scientists use varying methods to collect large amounts of data.
- Models and methods for data. Data scientists need to rely on experience and intuition to decide which methods will work best for modeling their data, and they need to adjust those methods continuously to hone in on the insights they seek.
- Pedagogy. It is up to data scientists to work with companies and clients to determine the best ideologies to apply while collecting and analyzing information about their customers and products.
- Computing with data. The biggest thing that all data science projects have in common is the necessity to use tools and software to analyze the involved algorithms and statistics, because the size of the pool of information they are working with is so massive.
- Theory. Data science theory is an evolving and sophisticated professional arena with countless applications.
- Tool evaluation. There are many tools available for data scientists to use to manipulate and study huge quantities of data, and it's important to always evaluate their effectiveness and keep trying new ones as they become available.
summary
useful urls
-
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html
-
https://www.coursera.org/learn/machine-learning-with-python
-
https://www.w3schools.com/python/python_ml_getting_started.asp
-
https://www.freecodecamp.org/learn/machine-learning-with-python/
-
https://www.greatlearning.in/great-lakes-pgpdsba?&utm_source=Google&utm_medium=Search&utm_campaign=6Cities_Exact_Data_Science_Search_New_DS&adgroup_id=101317851589&campaign_id=10174480218&Keyword=data%20scientist&placement=&utm_content=c&gclid=CjwKCAjwn6GGBhADEiwAruUcKqPCvPIk1X_5mVRXj5prdpSIULnd40QgTB4kChfiFgAL1kDErGeLHRoCapUQAvD_BwE
Get Started
- This repo shows a good collection of Machine learning with python and data science with algorithms,projects,explanations from basic to advance level.
- It has topics based on machine learning, deep learning, sql, natural language proccessing, object detection, classification, recommendation system,chatbots and much more.
Take a look at existing projects
Content List |
---|
Advanced Visualizations |
Alzheimer's Disease Predictor |
Analysis And predict_Black_friday_sale |
Audio Classification |
Automatic Summarization of Scientific Papers |
Basics of ML and DL |
Basics of Power Bi |
Basics of the Python |
Bidirectional LSTM |
Bird Species Classification Web App |
Bitcoin Price Prediction Web App |
Bitcoin Price Predictor |
CBT_ChatBot |
COVID_19-DATA-ANALYSIS |
Cheat Sheets |
Class Imbalance problem |
Classification Algorithms |
Cloud Details |
Covid19 forecasting with prophet |
Covid_Third_Wave_Forecasting |
CrowdAI Plant Disease |
Customer Segmentation using Machine Learning |
Data Cleaning Techniques |
Data Filling and Cleaning Techniques |
Different types of Clustering |
Different types of feature selection techniques |
Different_types_of_scaling_Method |
Driver_Drowsiness_Detection |
EDA-and-Perform-Modelling-on-Ionosphere-Dataset-main |
Email Classifier |
Emotion Recognition Based on NLP |
Ensemble Methds in ML |
Explaination and Example for P value with code |
Exploratory-data-analysis |
Extract_Text_from_PDF_using_Python |
Fake_News_Detection |
File of SQL Commands |
Fish-Weight-Estimation |
Flight_delay_prediction_project |
GDP Prediction |
GUI-JARVIS |
Gender Pay Gap Analysis |
Google Teachable Machine |
Handwritten Equation Solver using CNN |
Handwritten character recognition |
Heart_Predection |
HollywoodMarketSynopsis |
IMDB Box Office Prediction |
LanguageDetection |
Medical Charges for Smokers and Non-smoker |
Medical_Help_Chatbot |
Meteorite Landing Data Analysis |
Movie-Recommendation-System |
Movie-Recommender-System using python |
Nasa-Asteroids-Dataset-Analysis |
NumPy - Basics |
Number_of_people_counter |
OCR-Medicine-Reader |
Object Detection |
Ola Bike Ride Request Demand Forecast |
Optical character recognition (OCR) |
Plant Seedlings Classification |
R language |
Random forest from scratch |
Random forest test |
Rock Paper Scissors Python Game |
Sentiment analysis for depression based on social media posts |
Sentiment-Analysis |
Skin Disease Predictor |
Spam Mail Detection |
Speech_Emotion_Recognition |
Spelling Corrector |
Sports Analytics Project |
Startup_Profit_Prediction |
Stock Price Analysis |
Sudoku Solver using CNN |
Tensorflow.js Demo |
Time Series Forecasting with Python |
Time-Series LSTM Model |
Unique Chatbot |
Various Plots using Matplot,Seaborn,Pandas |
Vehicles and Pedestrian Detection |
Weather Prediction |
Web-Scraping-with-Beautiful-Soup-master |
XgBoost_Algorithm |
ensemble-methods-notebooks-master |
heart failure |
job_Advertisement_detection |
logistic_regression_scratch |
recommendation_system |
.DS_Store |
Analysis_of_Temperature_Rise_in_PMSM.ipynb |
Beautiful Soup.ipynb |
Ensemble learning.docx |
Ensemble-Learning (Stacking) |
Machine Hack -1.ipynb |
README.md updated file |
Role_from_Resume.ipynb |
Sql |
Statistics- Basics.ipynb |
Test Task_NIket.ipynb |
UBER_DATA_ANALYSIS.ipynb |
Various_Plots_in_Matplotlib.ipynb |
Visualization with Seaborn & Matplotlib.ipynb |
buyer_s_time234.ipynb |
random_forest.py |
Note:
- Above project list will be scheduled automatically,whenever new projects add to the repo it will add in above table.
π Code Of Conduct:
You can find our Code of Conduct here.
π License
This project follows the MIT License.
Have a look
-
Give it a π if you β€ this project.
-
Take a look at the Existing Issues.
-
Create your own Issues, If you have new idea not listed in project.
-
Wait for the Issue to be assigned to you.
-
Fork the repository
data:image/s3,"s3://crabby-images/4aee2/4aee2e1bf86b3a756454b48b2f554e8c6e49fd27" alt=""
- Clone the repository using-
git clone https://github.com/Niketkumardheeryan/Hands-on-ML-Basic-to-Advance-
βοΈ Contribution Guidelines
- Have a look at Contibuting Guidelines
Some awesome Contributors β¨
Niket kumar Dheeryan (Author) π» |
|||||
Abhishek Sharma π» |
Sakalya100 π» |
Kaustav Roy π» |
Soumayan Pal π» |
Komal Gupta π» |
Manu Varghese π» |
Abhishek Panigrahi π» |
Padmini Rai π» |
psyduck1203 π» |
Rutik Bhoyar π» |
Ayushi Shrivastava π» |
Anshul Srivastava π» |
RISHAV KUMAR π» |
Megha0606 π» |
Jagannath8 π» |
Harshita Nayak π» |
ayushgoyal9991 π» |
SurajPawarstar π» |
Sumit11081996 π» |
Tanvi Bugdani π» |
Suyash Singh π» |
Abhinav Dubey π» |
Nisha Yadav π» |
Neeraj Ap π» |
Nishi π» |
shivani rana π» |