socceraction icon indicating copy to clipboard operation
socceraction copied to clipboard

Handle lagging SPADL features for first actions in games/periods

Open RobWHickman opened this issue 3 years ago • 2 comments

a question not an issue per se

When lagging gamestates to compute features on spadl https://github.com/ML-KULeuven/socceraction/blob/772fa766aeba6c77a624740229e505593fd84f98/socceraction/vaep/features.py#L36 the default fill is 0. Given that 0 is a valid type_id (at least for Statsbomb where it is a pass), is this (ever so slightly) affecting results by saying that (e.g.) when a team kick off, the last 3 actions have been passes.

I imagine this is of little to no consequence in reality as so few actions happen from kick off but might be worth assigning either a 999 or NA (etc.) to lagged actions which do not have a preceeding action?

RobWHickman avatar Aug 25 '20 10:08 RobWHickman

I wonder whether XGBoost is able to learn automatically that the preceding actions are irrelevant for a pass when the previous action was a goal or when the period changed since the previous action. Similarly, can XGBoost learn that the two preceding actions are irrelevant on free kicks, corners, and goal kicks? That would be an interesting experiment.

If XGBoost is not able to learn that, I think it would be best to include a separate action type for restarts (kick-offs and drop-balls). If you assign a missing value, XGBoost will impute them and that might lead to strange values as well.

probberechts avatar Aug 27 '20 15:08 probberechts

yeah, I'd be interested to read if anyone wanted to look into it. I think on the whole it doesn't really matter because actually those make up such a small percentage (let's say 50 free kick + corners + kick offs is still ~2.5% of all actions captured by SPADL) so don't mind if you want to close the issue

RobWHickman avatar Sep 02 '20 12:09 RobWHickman