CSV -> DTA Conversion Fails to Format Dates Properly
I'm attempting to convert a csv file to STATA dta format using the JSON metadata format.
I've put together an example with just a few columns:
permno,date,eom
10107,2020-01-31,2020-01-31
10107,2020-02-28,2020-02-29
[...]
This is the JSON file I've generated to provide the metadata for the conversion:
{
"type": "STATA",
"variables": [
{
"type": "NUMERIC",
"name": "permno",
"label": "CRSP permno (permno)",
"format": "UNSPECIFIED"
},
{
"type": "NUMERIC",
"name": "date",
"label": "Day of last price observation (date)",
"format": "DATE"
},
{
"type": "NUMERIC",
"name": "eom",
"label": "End of month (eom)",
"format": "DATE"
}
]
}
The conversion seems to run properly: Converted 3 variables and 120 rows in 0.01 seconds
However, when I open up the file in STATA v18, it doesn't properly format the dates. I've read through the code, and I seem to be triggering the proper type and format selection, as if I change them, it triggers an error. I'm a STATA novice, but here's what the CLI is reporting for the generated file:
. use "~/test.dta"
((null))
. describe
Contains data from ~/test.dta
Observations: 120 (null)
Variables: 3 16 Apr 2024 14:04
-------------------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------------------------------------------------------
permno double %10.0g CRSP permno (permno)
date long %12.0g Day of last price observation (date)
eom long %12.0g End of month (eom)
-------------------------------------------------------------------------------------------------------------------------------
Sorted by:
. tabulate eom
End of |
month (eom) | Freq. Percent Cum.
------------+-----------------------------------
21945 | 1 2.50 2.50
21974 | 1 2.50 5.00
[...]
As you can see, it doesn't seem to be properly applying type or format, and when I tabulate, it displays as a number rather than a date. Have I uncovered a bug, or am I doing something wrong? I tried to follow the README and the spec.
Stata date variables are just longs with a certain formatting applied. Try running format %tdCCYY-NN-DD date eom on your imported data.
@NilsEnevoldsen Thanks so much for the quick response, and further information. I'll give it a whirl and come back to close the ticket. 💯
We're continuing to see this issue, and several more experienced STATA users have told me that it is likely the format should be applied during the conversion process, since we are explicitly setting the DATE and DATE_TIME format in the JSON map. Could someone confirm for me that the format should be applied during the readstat conversion process? Am I doing something wrong during my conversion attempt?
When I use SAS to export the same data, the display format seems to be properly set to %td instead of %12.0g:
-----------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-----------------------------------------------------------
permno double %8.0g PERMNO
date long %td Names Date
eom long %td Names Ending Date
And it shows up properly in the data:
+---------------------------------+
| permno date eom |
|---------------------------------|
1. | 10107 31jan2020 . |
2. | 10107 03feb2020 . |
3. | 10107 04feb2020 . |
[...]