zipline icon indicating copy to clipboard operation
zipline copied to clipboard

Add option to filter data through Winsorisation

Open ehebert opened this issue 12 years ago • 8 comments

Filter out extreme values which are assumed to be spurious because of their extremity.

As requested by Jessica Stauth on Quantopian forums, https://www.quantopian.com/posts/feature-requests-what-changes-would-you-like-to-see

Quoted from that post:

  1. add an option to 'winsorise' returns for outlier handling - a notorious issue with backtests is hidden outliers in returns data - sometimes they are obvious, you trade a stock and it makes 10,000% in 1 day (oops pricing error, currency issue etc) - but sometimes these errors can be hidden. Winsorizing your returns data allows you to set sanity bounds on what returns you think a stock can achieve, so you might say, clip my returns data at -99% and + 2 standard deviations from the mean returns for that time period. Better explained here: http://en.wikipedia.org/wiki/Winsorising

ehebert avatar Apr 05 '13 21:04 ehebert

Hey, we can actually winsorize the returns on our end with SciPy's mstats:

scipy.stats.mstats.winsorize(a, limits=None, inclusive=(True, True), inplace=False, axis=None)

Instead with trimming:

scipy.stats.mstats.trim(a, limits=None, inclusive=(True, True), relative=False, axis=None)

http://docs.scipy.org/doc/scipy/reference/stats.mstats.html

nabm avatar Apr 16 '13 17:04 nabm

@nabm, that looks like it will do the job nicely. Thanks, always impressed by how much scipy et al. provide.

It shouldn't be too much effort, then, to have a batch transform that wraps this function.

ehebert avatar Apr 17 '13 15:04 ehebert

Great, I'll give it a shot. I'm actually using this offline for Upgrade Capital's trading competition - it's been really helpful.

nabm avatar Apr 21 '13 01:04 nabm

Thanks for taking a look at it!

Good luck in the competition.

ehebert avatar Apr 23 '13 02:04 ehebert

@ehebert Would you say this line fixes this issue?

freddiev4 avatar May 16 '17 20:05 freddiev4

@FreddieV4 that function enables winsorizing the pricing data, which may be sufficient to winsorize the returns, as long as pricing data is the only input that would cause the returns to have outliers.

ehebert avatar May 16 '17 21:05 ehebert

Expanding on https://github.com/quantopian/zipline/issues/127#issuecomment-301916947, the function you linked only applies to daily pricing data.

To fully winsorize input data a similar function would need to be applied to the writers minute_bars module.

ehebert avatar May 16 '17 21:05 ehebert

Can Iwork on this?

Rish001 avatar Oct 01 '20 10:10 Rish001