sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Document reading tabular pedigree formats into sgkit

Open timothymillar opened this issue 1 year ago • 3 comments

We don't currently have any IO functionality for pedigree formats. These are usually tabular but can be quite variable. We should document how to read in some generic examples and add them to an sgkit style dataset.

Basic workflow:

  • Read tabular format as pandas dataframe
  • Assign sample identifiers to the sample_id variable
  • Assign parental columns to the parent_id variable
  • Optionally set coords for the parents dim (['Father', 'Mother'], ['Sire', 'Dam'], etc.)
  • Use parent_indices to generate the parents array and explain the 0-based indexing etc.
  • Do something interesting like calculating kinship.

timothymillar avatar Jan 30 '23 21:01 timothymillar