ietoolkit iematch - issue with long numeric strings

iematch - issue with long numeric strings

Open kbjarkefur opened this issue 3 years ago • 1 comments

Hi DIME Analytics,

I’ve tried using iematch, but when I run the command it continuously runs and never completes. I’ve let it run for several minutes. I don’t have an error code to provide because the command never completes or breaks. I’m testing it on a subset of data for which I have 40 observations, 10 in the base group and 30 in the target group. I’m trying to execute a 1-1 match. I would not think the command would take several minutes to run on 40 observations.

Set seed 1956 iematch if pair==1 & grade==2, grpdummy(srm_treatment) matchvar(orf) idvar(student_id) seedok replace

I assume this is a user-issue, but I wanted to verify that there was not some other issue with the command.

Sean

Hi Sean,

Thanks for letting us know. When I developed this I had to account for many infinite loop issues, but since the release I have not had anyone report another case. Are you able to share a deidentified version of the data that I can test myself on? If there is an error I’d like to fix it as others might have had the same issue without reporting it.

I do not see any error in the information you have provided so far.

Best, Kristoffer

[External] Hi Kristoffer,

Thanks for the reply. Attached is a de-identified dataset using a subset of the data. I’ve included the first two pairs of matched schools. The full dataset has 25 matched school pairs. For treatment schools, we assessed 10 students, but for control schools we assessed 30 students. Students from Grade 2 and Grade 4 were assessed. The goal is to match at the student-level within the matched schools, find a match control student for each treatment student. This has to be done for students in Grade 2 and Grade 4.

A few notes on the dataset that I’ve attached. The student ID variables are randomly generated by our data collection app, so the numbers have no meaning outside of the dataset. I’ve recoded the school codes with numbers from a random number generator. The original school codes are tied to EMIS codes in the country where the study is happening. The variable for the match is orf, which is oral reading fluency. Let me know if you have any questions about the attached dataset.

Sean

Mar 18 '21 19:03 kbjarkefur

The issue is that the data collection app created IDs that was so long that they were stored in doubles. The tempvars iematch creates to keep of the tracking were created as the default value when not specifying a type. for example like this gen `prefID' = . The default value is float.

When the command copied the ID var to these tempvars it lost precision. I thought that Stata would have converted the tempvar to a double to not lose data but it didn't. When the command then checked towards the original ID variable it found no matches as the values was now not identical. This is why the command was stuck in a infinite loop.

The solution was to create all ID vars with the same data type as the oringainl ID var like this gen `:type `idvar'' `prefID' = . This is implemented in 94db45fec2a8660883bdc38af9c7697c438b08ab

Mar 18 '21 19:03 kbjarkefur

ietoolkit ietoolkit copied to clipboard

iematch - issue with long numeric strings

ietoolkit
ietoolkit copied to clipboard