FIFA-World-Cup-Prediction
FIFA-World-Cup-Prediction copied to clipboard
Predict who will win the FIFA World Cup 2018
Project Description
Objective:
- Prediction of the winner of an international matches Prediction results are "Win / Lose / Draw" or "goal difference"
- Apply the model to predict the result of FIFA world cup 2018.
Data: Data are assembled from multiple sources, most of them are from Kaggle, others come from FIFA website / EA games.
Feature Engineering: To determine who will more likely to win a match, based on my knowledge, I come up with 4 main groups of features as follows:
- head-to-head match history between 2 teams
- recent performance of each team (10 recent matches), aka "form"
- bet-ratio before matches
- squad strength (from FIFA video game)
Feature list reflects those factors.
Lifecycle
Report
Check the Full Report to gain more insight about this Project. The report contains:
- Exploratory Data Analysis: Investigate correlations, importance of features to results, hypothesis interesting
- Methodology: How I carried out this project, which experiments I did.
- Models: baseline model, logistic regression, random forest, gradient boosting tree, ADA boost tree, Neural Network.
- Evaluation Criteria: F1, 10-fold cross validation accuracy
- Results and Conclusion
Project Structure
- EDA: Data Exploratory Analysis
- LE: saved model for Label Encoder
- data: completed dataset
- save_model: saved Machine Learning model after training
Data
Data Source
The dataset are from all international matches from 2000 - 2018, results, bet odds, ranking, squad strengths
- FIFA World Cup 2018
- International match 1872 - 2018
- FIFA Ranking through Time
- Bet Odd
- Bet Odd 2
- Squad Strength - Sofia
- Squad Strength - FIFA index
Feature List
- *difference: team1 - team2
- *form: performance in 10 recent matches
Feature Name | Description | Source |
---|---|---|
team_1 | Nation Code (e.g US, NZ) | 1 & 2 |
team_2 | Nation Code (e.g US, NZ) | 1 & 2 |
date | Date of match yyyy - mm - dd | 1 & 2 |
tournament | Friendly,EURO, AFC, FIFA WC | 1 & 2 |
h_win_diff | Head2Head: win difference | 2 |
h_draw | Head2Head: number of draw | 2 |
form_diff_goalF | Form: difference in "Goal For" | 2 |
form_diff_goalA | Form: difference in "Goal Against" | 2 |
form_diff_win | Form: difference in number of win | 2 |
form_diff_draw | Form: difference in number of draw | 2 |
odd_diff_win | Betting Odd: difference bet rate for win | 4 & 5 |
odd_draw | Betting Odd: bet rate for draw | 4 & 5 |
game_diff_rank | Squad Strength: difference in FIFA Rank | 3 |
game_diff_ovr | Squad Strength: difference in Overall Strength | 6 |
game_diff_attk | Squad Strength: difference in Attack Strength | 6 |
game_diff_mid | Squad Strength: difference in Midfield Strength | 6 |
game_diff_def | Squad Strength: difference in Defense Strength | 6 |
game_diff_prestige | Squad Strength: difference in prestige | 6 |
game_diff_age11 | Squad Strength: difference in age of 11 starting players | 6 |
game_diff_ageAll | Squad Strength: difference in age of all players | 6 |
game_diff_bup_speed | Squad Strength: difference in Build Up Play Speed | 6 |
game_diff_bup_pass | Squad Strength: difference in Build Up Play Passing | 6 |
game_diff_cc_pass | Squad Strength: difference in Chance Creation Passing | 6 |
game_diff_cc_cross | Squad Strength: difference in Chance Creation Crossing | 6 |
game_diff_cc_shoot | Squad Strength: difference in Chance Creation Shooting | 6 |
game_diff_def_press | Squad Strength: difference in Defense Pressure | 6 |
game_diff_def_aggr | Squad Strength: difference in Defense Aggression | 6 |
game_diff_def_teamwidth | Squad Strength: difference in Defense Team Width | 6 |
How to Run:
python experiment1-W-D-L.py
python experiment2-GoalDiff.py
python experiment3-WorldCup.py
Reference
- A machine learning framework for sport result prediction
- t-test definition
- Confusion Matrix Multi-Label example
- Precision-Recall Multi-Label example
- ROC curve example
- Model evaluation
- Tuning the hyper-parameters of an estimator
- Validation curves
- Understand Bet odd format
- EURO 2016 bet odd
Task List
Complete
- [x] Add prediction for Matchday 2
- [x] Add feature Importance
- [x] Add feature of squad and player info
- [x] Build a web crawler for Squad each team
- [x] Build a web crawler for FIFA game player
- [x] Add a simple classification based on "bet odd".
- [x] Add feature group 1
- [x] Add h_win_diff, h_draw
- [x] Add rank_diff, title_diff
- [x] Add features group 2
- [x] Add features group 3
- [x] Simple EDA and a small story
- [x] Add features group 4
- [x] Prepare framework for running classifiers
- [x] Add evaluation metrics and plot
- [x] Add accuracy, precision, recall, F1
- [x] Add ROC curves
- [x] Build a data without player rating and squad value
- [x] Generate data and preform prediction for EURO 2016, ok now my story is more interesting
- [x] Create more data, "teamA vs teamB -> win" is equivalent to "teamB vs teamA -> lose"