ftools
ftools copied to clipboard
Possible issue with fsort when clearing sort variable
In testing hashsort, I found that fsort
sometimes did not give me an identical data set compared to sort, stable
. I cannot replicate this from a blank session very easily, so here is the data that gives the issue:
local addr https://raw.githubusercontent.com/mcaceresb/stata-gtools
local path develop/src/github-issues/
use `addr'/`path'/fsort_share.dta
sort int1, stable
tempfile cmp
save `cmp'
use `addr'/`path'/fsort_share.dta
fsort int1
cf * using `cmp'
The result is
. cf * using `cmp'
rsort: 1 mismatch
r(9);
I believe the issue is with Andrew Maurer's trick to clear : sortedby
. I got around this by setting obs
to =_N + 1
, manipulating the last observation, and dropping it. This way the origina data is never altered.
It's indeed related to Maurer's trick. When I save the first value as local and then overwrite it, I end up with a loss of precision of ~8e-16 (i.e. the precision that double provides).
Expanding the dataset is indeed one alternative, that also has the advantage of not depending on whether sortvar is string or not.
One thing that was weird though is that I tried with a lot of random numbers and could not reproduce this issue in other datasets, so it must be pretty specific to some conditions (and of course you can't just assign the value in a do file because that suffers from the same loss of precision).
You still have to alter the N + 1th value from missing to non-missing to clear the sort variable, I think (else it thinks it's still sorted, presumably since missing is larger than anything).
I deleted the file in my latest commit. I think if you call 1c5fc9c21045216575313b1449544d49ee4dd283
instead of develop
it should still come up, in case you stll want to test it.