sgkit
sgkit copied to clipboard
Need help how to calculate pbs in genome by windows way
hi @all
I have a VCF with three populations. I use sg.pbs to calculate pbs. I got the pbs values, but I want to get CHROM
BIN_START BIN_END PBS four column tsv file. can you tell me how to get it from res object?
Below is my code
ds = sg.load_dataset("All.filtered.maf0.05.miss0.2.biallelic.zarr")
ds["sample_cohort"] = xr.DataArray(new_store_samples, dims="samples")
cohort_names = [f"Group{i}" for i in range(1, 4)]
ds = ds.assign_coords({"cohorts_0": cohort_names, "cohorts_1": cohort_names, "cohorts_2": cohort_names})
ds = sg.window_by_position(ds, size=100000, step=50000)
res = sg.pbs(ds)["stat_pbs"].sel(cohorts_0="Group1", cohorts_1="Group2", cohorts_2="Group3").values
```
Waiting for your reply. thanks all!
Hi @ChenDepp, you can export a dataset (e.g. res) as a CSV file by converting to a Pandas dataframe first. You might need to subset the variables before doing that, see https://pystatgen.github.io/sgkit/latest/how_do_i.html#subset-the-variables. Hope that helps!