ftools icon indicating copy to clipboard operation
ftools copied to clipboard

Adding update / replace to fmerge

Open felixholub opened this issue 6 years ago • 9 comments

Would it be possible to extend fmerge to allow for update or replace?

felixholub avatar Feb 14 '18 10:02 felixholub

Yes. It might involve a bit of Mata work, but I'm a bit pressed on time for the next month or so, so can't promise any update.

For reference (in case you want to try, or maybe for future me), I think it would involve changing line 381:

	// Check that variables don't exist yet
	msg = "{err}merge:  variable %s already exists in master dataset\n"
	for (i=1; i<=cols(deck); i++) {
		var = deck[i]
		if (_st_varindex(var) != .) {
			printf(msg, var)
			exit(108)
		}
	}

Instead of raising an error if the variable exists, when the -update- option is on, you would have to create a tempvar (st_tempvar()?) and then replace row i of varnames_num.

Then, after the Mata code finished running, something like replace original_var = tempvar if mi(original_var)

sergiocorreia avatar Feb 14 '18 11:02 sergiocorreia

Thanks for the explanation Sergio. It's nothing urgent, just something that I stumble upon every once in a while. Maybe I can use your hint to practice my Stata coding;-)

2018-02-14 12:08 GMT+01:00 Sergio Correia [email protected]:

Yes. It might involve a bit of Mata work, but I'm a bit pressed on time for the next month or so, so can't promise any update.

For reference (in case you want to try, or maybe for future me), I think it would involve changing line 381 https://github.com/sergiocorreia/ftools/blob/master/src/join.ado#381:

// Check that variables don't exist yet msg = "{err}merge: variable %s already exists in master dataset\n" for (i=1; i<=cols(deck); i++) { var = deck[i] if (_st_varindex(var) != .) { printf(msg, var) exit(108) } }

Instead of raising an error if the variable exists, when the -update- option is on, you would have to create a tempvar (st_tempvar()?) and then replace row i of varnames_num.

Then, after the Mata code finished running, something like replace original_var = tempvar if mi(original_var)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sergiocorreia/ftools/issues/19#issuecomment-365570557, or mute the thread https://github.com/notifications/unsubscribe-auth/AN3iDvtlGqVk2_79jyRMsvYiHTjr6yqVks5tUr6TgaJpZM4SFGbA .

-- Felix Holub

felixholub avatar Feb 14 '18 12:02 felixholub

mmerge has some really useful options (e.g. unmatched [unmatched observations to keep - none, both, master, using], umatch [for the case that variables in using are named different to master], uname [add a stuf to variables in using])

Is there any intent to add additional features to fmerge?

aghaynes avatar Jun 07 '19 09:06 aghaynes

Hi Alan,

Can you explain a bit more what these options do? I installed mmerge from SSC but I'm not entirely sure of what unmatched() does that merge's keep() doesn't.

Regarding umatch() it can actually be done through the join command. I actually wrote fmerge as a wrapper to join (which has a more familiar syntax for me). For instance, suppose you have a panel of consumers (where t is the year identifier) and want to add some macro data from a dataset (where year is the year identifier)

With merge, you do:

rename t year
merge m:1 year using "annual_data", keepusing(gdp inflation)
rename year t

With join, you do:

join gdp inflation, from("annual_data") by(t=year)

(Note how the join syntax looks more like the collapse() one, and is more explicit about which variables get added)

sergiocorreia avatar Jun 08 '19 00:06 sergiocorreia

Hi Sergio,

Ignore my message. You're completely correct - it's all possible with the other options. (the main advantage to mmerge is that its a bit more verbose in it's reporting)

I wasn't aware of join... I think i'll be looking into that a bit more - i have some quite large datasets which take merge/mmerge a long time to combine...

Thanks!!

aghaynes avatar Jul 03 '19 07:07 aghaynes

@aghaynes mentioned one which, as far as I can see, join doesn't do and could be potentially useful. The uname option allows adding a stub to the variable names of using data. This makes it easy to distinguish which variables were pre-existing and which are new, maybe for comparison.

luispfonseca avatar Sep 09 '19 14:09 luispfonseca

Agree, that should be useful and simple to implement. That said, uname() doesn't seem like an esy-to-remember option, so maybe stub(), prefix() or sth like that?

sergiocorreia avatar Sep 09 '19 15:09 sergiocorreia

Yes, I agree. Either of these seem fine. stub seems to be commonly used, but I'd say prefix is more intuitive if you've never heard of it.

luispfonseca avatar Sep 09 '19 15:09 luispfonseca

For what it's worth prefix is the pattern used by frget in Stata 16

ArthurHowardMorris avatar Nov 12 '20 08:11 ArthurHowardMorris