sgkit
sgkit copied to clipboard
Document reading tabular pedigree formats into sgkit
We don't currently have any IO functionality for pedigree formats. These are usually tabular but can be quite variable. We should document how to read in some generic examples and add them to an sgkit style dataset.
Basic workflow:
- Read tabular format as pandas dataframe
- Assign sample identifiers to the
sample_id
variable - Assign parental columns to the
parent_id
variable - Optionally set coords for the
parents
dim (['Father', 'Mother']
,['Sire', 'Dam']
, etc.) - Use
parent_indices
to generate theparents
array and explain the 0-based indexing etc. - Do something interesting like calculating kinship.