Assign numpy structured array to metadata
No work has been done on this, there are some easy wins like grouping together calls to pack and unpack where possible. Maybe after #511
Use struct.pack_into and struct.iter_unpack
Use a separate code path for schemas that have known formats (i.e. no arrays).
I'm feeling some pain from this one. If we consider the following, which is generated from running a fwdpy11 sim:
Start sim at = 17:44:11
Burn in done at = 18:20:54
Start adaptation to new environment at = 18:20:54
Done at = 18:21:00
Done dumping native file format at at = 18:21:00 starting tskit export...
Done dumping to tskit at = 18:36:21
The simulation is done and written to the fwdpy11 native format in about 35 minutes. There is metadata for 9e5 individuals, which causes the writing to a trees file to take over 15 minutes.
When creating the tskit.TableCollection, I am using add_row (as opposed to set_columns), and the metadata schema is here.
Thanks for the info @molpopgen - can you remind us about this in a couple of weeks when Ben is back please?
Reminder!
Thanks @molpopgen I've pencilled this in for the next release, but depending on complexity it may slip.
We now have fast decoding with numpy structured arrays - I assume it is possible to do the reverse and support assigning a structured array to metadata, rahter than try to opimise the generic struct encoder.
That's a great idea - a much better approach!