SIGSPATIAL-2021-GISCUP-2nd-Place-Solution
SIGSPATIAL-2021-GISCUP-2nd-Place-Solution copied to clipboard
trafficstars
SIGSPATIAL-2021-GISCU-2nd-Place-Solution
Competition Page: DiDi-ETA
Final Official Ranking
| Ranking | Award (Cash Prize) | Name | MAPE |
|---|---|---|---|
| 1 | Champion ($10,000) | 单模CBT | 0.11974 |
| 2 | Runner-ups ($5,000 each team) | Pims | 0.12099 |
| 3 | 华南工农联盟 | 0.12116 | |
| 4 | Second Runner-ups ($2,500 each team) | 机器算命 | 0.12177 |
| 5 | pumbaa | 0.12198 | |
| 6 | Recognition Award ($1,000 each team) | MobiLab | 0.12478 |
| 7 | 悦智AI实验室 | 0.12511 |
Team Name: Pims
Team Members: Yunchong Gan, Mingjie Wang, Haoyu Zhang
Quick Start
Prepare Data
Download the dataset from here and change data_dir in dataset.py.
python dataset.py
It will preprocess the original .txt files, convert them into .json files and .pickle files to accelerate the data loading.
Then it will split the whole train dataset into 5Fold and 10Fold.
Train & Test
Train
python train.py
Test
python test.py
Data Ensemble
Use the simple average result to generate the final submission.
The final leaderboard result is the average of 5fold and 10fold (15 model in total).
python merge_submission.py
Details
Model Architecture
The whole model based on WDR, Didi ETA paper in KDD2018.
Wide \
\
Deep --- concat - MLP - Prediction
/
RNN -/
|
|----Predict Current Link Status
Input
Wide
| Name | Type | Number of Embedding | Embedding Dim | Description |
|---|---|---|---|---|
| Simple ETA | Numeric | 1 | ||
| Distance | Numeric | 1 | ||
| Link Number | Numeric | 1 | ||
| Cross Number | Numeric | 1 | ||
| Approximate Speed | Numeric | 1 | ||
| Weekday | Categorical | 7 | 1 | |
| Slice ID | Categorical | 48 | 1 | |
| Distance(Categorical) | Categorical | 5 | 1 |
Deep
| Name | Type | Number of Embedding | Embedding Dim | Description |
|---|---|---|---|---|
| Simple ETA | Numeric | 1 | ||
| Distance | Numeric | 1 | ||
| Link Number | Numeric | 1 | ||
| Cross Number | Numeric | 1 | ||
| Approximate Speed | Numeric | 1 | ||
| Weekday | Categorical | 7 | 20 | |
| Slice ID | Categorical | 48 | 20 | |
| Driver ID | Categorical | depend on dataset | 64 | |
| Distance(Categorical) | Categorical | 5 | 20 | Split in 3/7/12/20km |
RNN - Link
| Name | Type | Number of Embedding | Embedding Dim | Description |
|---|---|---|---|---|
| Link Time | Numeric | 1 | ||
| Link Ratio | Numeric | 1 | ||
| Link Status(Onehot) | Numeric | 5 | ||
| Weekday | Categorical | 7 | 20 | |
| Slice ID | Categorical | 288 | 20 | compute with slice id and link/cross time |
| Link ID | Categorical | depend on dataset | 20 |
RNN - Cross
| Name | Type | Number of Embedding | Embedding Dim | Description |
|---|---|---|---|---|
| Cross Time | Numeric | 1 | ||
| Start Link ID | Categorical | depend on dataset | 20 | |
| End Link ID | Categorical | depend on dataset | 20 |
Different from WDR
- Auxiliary Loss for Link Status Classification
- Concat result from different branches
- Random Split KFold
- Model Ensemble