ReadStat icon indicating copy to clipboard operation
ReadStat copied to clipboard

Calculate XPT row_count using metadata and final chunk

Open gerrycampion opened this issue 1 year ago • 2 comments

Describe the issue

For XPT files, I understand that row_count cannot be extracted from the metadata alone, but I think it can be calculated using only the metadata and final 80-byte chunk.

Expected behavior

  • Read the header information to find: variable_sizes and the start of record data
  • Calculate record_size as sum of variable_sizes
  • Read the last 80-byte chunk of data to find out how much trailing ASCII blank padding there is.
  • Calculate number of records using: (total_file_size - start - padding) / record_size

It would be helpful if readstat could expose either:

  • row_count

or these, if not already exposed:

  • total_file_size
  • records_start_offset
  • records_end_offset

gerrycampion avatar Apr 30 '24 21:04 gerrycampion

Hi Gerry. Not proposing a solution, but I faced the same problem when implementing XPT v5 parser in JS. I calculated it from metadata using very similar approach to what you described above (here is my solution for reference https://github.com/defineEditor/xport-js/blob/c8da602428b25c94187befb20928e35a48d518ae/src/classes/member.ts#L162) Maybe the only difference that in my case I had to calculate records_start_offset manually and instead of subtracting a padding, I just used floor function.

Mainly I just want to confirm that this approach works and gives a correct row_count at least for XPTv5, so I hope it will be implemented in ReadStat at some point.

DmitryMK avatar Mar 10 '25 12:03 DmitryMK

For XPT v8 we should be able to just read in the observation count described here: #322

evanmiller avatar May 23 '25 00:05 evanmiller