zipline
zipline copied to clipboard
Add option to filter data through Winsorisation
Filter out extreme values which are assumed to be spurious because of their extremity.
As requested by Jessica Stauth on Quantopian forums, https://www.quantopian.com/posts/feature-requests-what-changes-would-you-like-to-see
Quoted from that post:
- add an option to 'winsorise' returns for outlier handling - a notorious issue with backtests is hidden outliers in returns data - sometimes they are obvious, you trade a stock and it makes 10,000% in 1 day (oops pricing error, currency issue etc) - but sometimes these errors can be hidden. Winsorizing your returns data allows you to set sanity bounds on what returns you think a stock can achieve, so you might say, clip my returns data at -99% and + 2 standard deviations from the mean returns for that time period. Better explained here: http://en.wikipedia.org/wiki/Winsorising
Hey, we can actually winsorize the returns on our end with SciPy's mstats:
scipy.stats.mstats.winsorize(a, limits=None, inclusive=(True, True), inplace=False, axis=None)
Instead with trimming:
scipy.stats.mstats.trim(a, limits=None, inclusive=(True, True), relative=False, axis=None)
http://docs.scipy.org/doc/scipy/reference/stats.mstats.html
@nabm, that looks like it will do the job nicely. Thanks, always impressed by how much scipy et al. provide.
It shouldn't be too much effort, then, to have a batch transform that wraps this function.
Great, I'll give it a shot. I'm actually using this offline for Upgrade Capital's trading competition - it's been really helpful.
Thanks for taking a look at it!
Good luck in the competition.
@ehebert Would you say this line fixes this issue?
@FreddieV4 that function enables winsorizing the pricing data, which may be sufficient to winsorize the returns, as long as pricing data is the only input that would cause the returns to have outliers.
Expanding on https://github.com/quantopian/zipline/issues/127#issuecomment-301916947, the function you linked only applies to daily pricing data.
To fully winsorize input data a similar function would need to be applied to the writers minute_bars module.
Can Iwork on this?