tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Assign numpy structured array to metadata

Open benjeffery opened this issue 5 years ago • 7 comments

No work has been done on this, there are some easy wins like grouping together calls to pack and unpack where possible. Maybe after #511

benjeffery avatar May 13 '20 13:05 benjeffery

Use struct.pack_into and struct.iter_unpack Use a separate code path for schemas that have known formats (i.e. no arrays).

benjeffery avatar May 15 '20 13:05 benjeffery

I'm feeling some pain from this one. If we consider the following, which is generated from running a fwdpy11 sim:

Start sim at = 17:44:11
Burn in done at = 18:20:54
Start adaptation to new environment at = 18:20:54
Done at = 18:21:00
Done dumping native file format at at = 18:21:00 starting tskit export...
Done dumping to tskit at = 18:36:21

The simulation is done and written to the fwdpy11 native format in about 35 minutes. There is metadata for 9e5 individuals, which causes the writing to a trees file to take over 15 minutes.

When creating the tskit.TableCollection, I am using add_row (as opposed to set_columns), and the metadata schema is here.

molpopgen avatar May 29 '21 18:05 molpopgen

Thanks for the info @molpopgen - can you remind us about this in a couple of weeks when Ben is back please?

jeromekelleher avatar Jun 01 '21 11:06 jeromekelleher

Reminder!

molpopgen avatar Jun 08 '21 17:06 molpopgen

Thanks @molpopgen I've pencilled this in for the next release, but depending on complexity it may slip.

benjeffery avatar Jun 08 '21 21:06 benjeffery

We now have fast decoding with numpy structured arrays - I assume it is possible to do the reverse and support assigning a structured array to metadata, rahter than try to opimise the generic struct encoder.

benjeffery avatar Jun 06 '25 10:06 benjeffery

That's a great idea - a much better approach!

jeromekelleher avatar Jun 06 '25 11:06 jeromekelleher