sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Docs: how do I *update* a dataset on file?

Open jeromekelleher opened this issue 2 years ago • 3 comments

I'm not finding any documentation how to do something like:

ds = sg.load_dataset(ds_path)    
ds = sg.count_variant_alleles(ds)
# Don't overwrite the whole thing, just write out the new variables so I don't have to 
# recompute them
sg.save_dataset(ds, ds_path) 

We have how do I save but that's not helping.

jeromekelleher avatar Sep 05 '23 15:09 jeromekelleher

#347 has some discussion and code for this. We didn't quite agree on an API, so I'd be interested to see what you think works in your situation.

tomwhite avatar Sep 05 '23 15:09 tomwhite

Nice, I'll try that out and report back.

jeromekelleher avatar Sep 05 '23 15:09 jeromekelleher

I've been using this pattern:

        ds.update({"new_array_name": new_array})
        sgkit.save_dataset(ds.drop_vars(set(ds.data_vars) - {"new_array_name"}), ds_dir, mode="a")

benjeffery avatar Sep 05 '23 22:09 benjeffery