stata-gtools icon indicating copy to clipboard operation
stata-gtools copied to clipboard

gtools version of merge

Open NilsJPWerner opened this issue 4 years ago • 5 comments

What would you like gtools to add or change (and why)? It would be fantastic if gtools had a gmerge command. Ftools seems to have join/fmerge that is a 2x speedup over merge but since it is implemented in mata it can't support mixed types.

Please include a specific suggestion Add gmerge command that implements the standard merge functionality.

NilsJPWerner avatar Jul 26 '21 21:07 NilsJPWerner

@NilsJPWerner In theory I'd like to implement this, but in practice I've looked into it a bit and it's very complicated and not at all clear that I'd get a very large speed improvement. I'd like to look into this again in the future but it won't be any time soon. Sorry!

mcaceresb avatar Jul 30 '21 18:07 mcaceresb

Unrelated to merge but also a suggestion: it would be great if you could provide a gtools enhancement for carryforward. This is an essential (to me) but often overlooked command, and currently extremely slow. Thanks!

fpet19 avatar Dec 01 '21 13:12 fpet19

@fpet19 I am curious, what is a specific scenario/example where carryforward is very slow? I have not used it but itsn't it a wrapper for replace var = var[_n-1] if mi(var)?

It's surprising this is specially slow. Or is the issue that if you call it with by that you have to sort the data first? Since it's sensitive to sort order I would have assumed sorting might have been an unavoidable operation.

mcaceresb avatar Dec 01 '21 14:12 mcaceresb

Yes, that seems to be the case, I always call it with by. I use gegen to create a group variable for a subset of the group, and then I populate it for the whole group using carryforward. The second command is over 5 times slower than the first.

I assume that whatever magic gtools does for gegen which does not require sorting and then resorting should be useful here. In very long datasets just avoiding having to xtset after gegen-related commands is worth it.

fpet19 avatar Dec 01 '21 16:12 fpet19