Calculate XPT row_count using metadata and final chunk
Describe the issue
For XPT files, I understand that row_count cannot be extracted from the metadata alone, but I think it can be calculated using only the metadata and final 80-byte chunk.
Expected behavior
- Read the header information to find:
variable_sizesand thestartof record data - Calculate
record_sizeas sum ofvariable_sizes - Read the last 80-byte chunk of data to find out how much trailing ASCII blank
paddingthere is. - Calculate number of records using:
(total_file_size - start - padding) / record_size
It would be helpful if readstat could expose either:
row_count
or these, if not already exposed:
total_file_sizerecords_start_offsetrecords_end_offset
Hi Gerry. Not proposing a solution, but I faced the same problem when implementing XPT v5 parser in JS. I calculated it from metadata using very similar approach to what you described above (here is my solution for reference https://github.com/defineEditor/xport-js/blob/c8da602428b25c94187befb20928e35a48d518ae/src/classes/member.ts#L162) Maybe the only difference that in my case I had to calculate records_start_offset manually and instead of subtracting a padding, I just used floor function.
Mainly I just want to confirm that this approach works and gives a correct row_count at least for XPTv5, so I hope it will be implemented in ReadStat at some point.
For XPT v8 we should be able to just read in the observation count described here: #322