Álvaro Kothe

Results 21 comments of Álvaro Kothe

The results for Skew are correct with #62863. Kurtosis still gives different results, but not by a large margin.

The sample only contains 2 values, as it's ignoring `NA`s. If the sample only contains 2 values, the correlation is only $-1$ or $1$, assuming all values are different. ##...

If you want to be sure, check with a datatype that supports more precision. ```python from decimal import Decimal, getcontext def compute_correlation(a, b, precision=50): getcontext().prec = precision n = Decimal(len(a))...

> numpy the corr value is 0.7. If the result in numpy is 0.7, it doesn't mean it's correct... ```python In [6]: a = np.array([30.0, 30.100000381469727]) In [7]: b =...

> but for this case we are looking at the unrounded values Can you clarify by what do you mean by unrounded value?

> I was basically expressing that this is an issue as the expected value given the unrounded numbers should be 0.7 rather than 1 or -1. I still don't understand...

> My thought was to run Welford until log(vx*vy) exceeds 14 or -14, then rerun using two-pass. Until where I know, Welford should be more stable than two-pass, because it...

> Well, it's worth a shot. From my understanding, two-pass is more stable than Welford for extremely large numbers. Nice that you see room for improvement. > I picked 14...

The class [`NLTKWordTokenizer`](https://github.com/nltk/nltk/blob/16429421e3954bf32d697374c70ec88f1f0d8dc6/nltk/tokenize/destructive.py#L37) is already implemented. However, neither `TreebankWordTokenizer` nor `NLTKWordTokenizer` exhibits the expected behavior of [McIntyre's tokenizer](https://web.archive.org/web/20070815155914/http://www.cis.upenn.edu/~treebank/tokenizer.sed), as shown in the example below: ```python >>> import nltk >>> tokenizer...

Hi @TristanJM, is this still an issue? If so, it would be really helpful for debugging if you could provide a minimal reproducible example. If the issue involves many files,...