gplately icon indicating copy to clipboard operation
gplately copied to clipboard

Runtime error saying incomplete pickle support

Open nickywright opened this issue 2 months ago • 7 comments

Hi,

I'm trying to run code with a fresh installation of GPlately via conda (installed last week). This notebook worked fine in GPlately 1 (an older version of python etc.... no clue now which sorry).

Running

M08_rotation_model, M08_topology_features, M08_static_polygons = M08_download.get_plate_reconstruction_files()
M08_coastlines, M08_continents, M08_COBs = M08_download.get_topology_geometries()

Muller08_model = gplately.PlateReconstruction(M08_rotation_model, M08_topology_features, M08_static_polygons)
raster = gplately.Raster(data=etopo_file)
raster.plate_reconstruction = Muller08_model

reconstructed_raster = raster.reconstruct(55, threads=4, partitioning_features=M08_static_polygons, anchor_plate_id=0)

Results in the RuntimeError at the last line: RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)

Removing the threads=4 makes no difference.

Using: Python 3.13.2 gplately: '2.0.0' pygplates: '1.0.0'

Any ideas on how to fix this please?

nickywright avatar Oct 22 '25 00:10 nickywright

I just tried the following on conda (with same versions as you). It seemed to work fine. Although mine is a different example (and I'm on Windows).

Can you also try the following to see if the error persists?

import gplately
from plate_model_manager import PlateModelManager, PresentDayRasterManager

pm_manager = PlateModelManager()
muller2019_model = pm_manager.get_model("Muller2019", data_dir="plate-model-repo")
rotation_model = muller2019_model.get_rotation_model()
topology_features = muller2019_model.get_topologies()
static_polygons = muller2019_model.get_static_polygons()

coastlines = muller2019_model.get_layer('Coastlines')
continents = muller2019_model.get_layer('ContinentalPolygons')
COBs =  muller2019_model.get_layer('COBs')

model = gplately.PlateReconstruction(rotation_model, topology_features, static_polygons)

time = 0
muller_2019_age_grid = gplately.Raster(
    data=muller2019_model.get_raster("AgeGrids", time),
        plate_reconstruction=model,
            extent=[-180, 180, -90, 90],
                )

reconstructed_raster = muller_2019_age_grid.reconstruct(
    55, threads=4, partitioning_features=static_polygons, anchor_plate_id=0)

jcannon-gplates avatar Oct 22 '25 01:10 jcannon-gplates

Ok that works!

I've just modified and managed to get this work! Thank you so much @jcannon-gplates!

muller2008_model = pm_manager.get_model("Muller2008", data_dir="plate-model-repo")
rotation_model = muller2008_model.get_rotation_model()
topology_features = None
static_polygons = muller2008_model.get_static_polygons()

model = gplately.PlateReconstruction(rotation_model, topology_features, static_polygons)

raster = gplately.Raster(data=etopo_file, plate_reconstruction=model)

reconstructed_raster = raster.reconstruct(
    55, threads=4, partitioning_features=static_polygons, anchor_plate_id=0)

so it must be something related to downloading plate models via the DataServer - I got the same error when I just subbed in the DataServer plate model download into the raster setup part.

Also: topology_features = muller2008_model.get_topologies() results in an error because there are no topologies associated with the model. Which is fine, but when doing the equivalent call with the DataServer, it returned None (i.e. didn't error). Can we change this somehow please for all the cases when the files don't exist (I guess can print out that they don't exist if needed if that's why we have the error?).

nickywright avatar Oct 22 '25 01:10 nickywright

Great! And thanks for confirming that the problem happens with DataServer (and not PlateModelManager). That helped a lot!

I too get the same error when I change my Muller2019 example to use DataServer.

The issue is not with DataServer itself but instead with pyGPlates. It only shows up with DataServer because it returns pygplates.RotationModel and pygplates.FeatureCollection objects, whereas PlateModelManager returns filenames.

And then presumably it is the pickling of these pyGPlates objects that is causing the error here (when reconstructing the raster): https://github.com/GPlates/gplately/blob/220d13fdd5015f7ef77b439007eb42074deb0ccf/gplately/grids.py#L2178

I'll take a deeper look. It's odd that pyGPlates is (presumably) emitting that error because it doesn't need to set __getstate_manages_dict__, as described here, because it doesn't define __getstate__.

jcannon-gplates avatar Oct 22 '25 02:10 jcannon-gplates

In any case, for others encountering this issue, the workaround for now is to use PlateModelManager instead of DataServer.

jcannon-gplates avatar Oct 22 '25 02:10 jcannon-gplates

Can we change this somehow please for all the cases when the files don't exist (I guess can print out that they don't exist if needed if that's why we have the error?).

There is a parameter "return_none_if_not_exist" which can make the function return "none" instead of throwing an exception.

https://gplates.github.io/plate-model-manager/latest/api.html#plate_model_manager.PlateModel.get_topologies

M.C. edited to update the doc URL.

michaelchin avatar Oct 22 '25 23:10 michaelchin

Turns out it happens when an attribute is added to a pyGPlates object before it is pickled (eg, with copy.deepcopy).

In this case it's happening in DataServer here: https://github.com/GPlates/gplately/blob/1a7389373ff579f673bf048a342657a9c1f5f844/gplately/download.py#L1147

#367 works around this for pyGPlates 1.0. In other words, GPlately master now works (with DataServer).

And pyGPlates 1.1 will include GPlates/GPlates@eff4c8b which fixes this in pyGPlates.

jcannon-gplates avatar Oct 27 '25 09:10 jcannon-gplates

I'm leaving this issue open as a reminder to remove the workaround in #367 when pyGPlates 1.1 becomes our minimum requirement.

jcannon-gplates avatar Oct 27 '25 09:10 jcannon-gplates

Hi all,

Hope everyone is well!

I also encountered this issue when I installed the latest GPlately via conda. I was pleased to see this GitHub issue because now my code is working again - phew! But it would be good to make a point release to address this issue so people who install via conda don't get upset when their parallel routines fail.

One issue that seems to have come up in the 2.0 release is that pickling / unpickling objects is much slower. I think this is because the data within rotation models, static polygons, topologies, etc. is being pickled rather than just the filenames. Here is a little test:

DataServer

gdownload = gplately.DataServer("Muller2022")
rotation_model, topology_features, static_polygons = gdownload.get_plate_reconstruction_files()
coastlines, continents, COBs = gdownload.get_topology_geometries()

model = gplately.PlateReconstruction(rotation_model, topology_features, static_polygons)

This takes 9.4 seconds to execute: Image

Direct file paths

plate_model_dir = "/Users/ben/Library/CloudStorage/SynologyDrive-Sync/USyd/GPlates/SampleData/Cao_et_al_2024_v2_4/"
rotation_model = [plate_model_dir+'1800_1000_rotfile.rot',
                  plate_model_dir+'1000_0_rotfile.rot']

topology_features = [plate_model_dir+'250-0_plate_boundaries.gpml',
                     plate_model_dir+'410-250_plate_boundaries.gpml',
                     plate_model_dir+'1800-1000_plate_boundaries.gpml',
                     plate_model_dir+'1000-410-Convergence.gpml',
                     plate_model_dir+'1000-410-Divergence.gpml',
                     plate_model_dir+'1000-410-Topologies.gpml',
                     plate_model_dir+'1000-410-Transforms.gpml',
                     plate_model_dir+'TopologyBuildingBlocks.gpml',
                     ]
static_polygons = plate_model_dir + 'StaticPolygons/Global_EarthByte_GPlates_PresentDay_StaticPlatePolygons.shp'
COBs = plate_model_dir + 'COBfile_1800_0.gpml'
continents = plate_model_dir + 'shapes_continents.gpmlz'
coastlines = plate_model_dir + 'shapes_coasts.gpmlz'

model = gplately.PlateReconstruction(rotation_model, topology_features, static_polygons)

This takes 2.3 seconds to execute: Image

This is not ideal in parallel environments, because a lot of computation time would be taken up by rebuilding Python objects.

Potential solutions

One way around this is to rebuild these objects from their filenames where possible. Here is my thinking:

  1. Keep track of paths + filenames of rotation models, topologies, etc. For example, every pygplates.RotationModel would have a filenames attribute that is a list containing all the .rot files that went into the rotation_model. This is similar to what we had earlier implemented for pickle support in the PlateReconstruction object (gplately v1.x)
  2. Work out if objects have been modified or not. If someone has tinkered with their rotation model then set modified=1, otherwise modified=0. One could use setters/getters or a context manager to catch any modifications to rotation models at the pygplates level.

If modified=0 then the filenames are picked, if modified=1 then the data inside are pickled. I think this would catch all use case scenarios. For instance, if two unmodified rotation models get concatenated then a new rotation model is returned with both filenames concatenated and modified=0. The same goes for gpml files.

What do you think?

Overkill option

If one wanted to get very fancy, then any modifications to the original rotation model (loaded from the .rot file) could be pickled, rather than all the data. This could be quite tricky though, especially in the case of gpml files since it's much harder to track changes to shapes than numbers in an array.

brmather avatar Nov 25 '25 04:11 brmather

Thanks @brmather, some good ideas in there.

But it would be good to make a point release to address this issue so people who install via conda don't get upset when their parallel routines fail.

Yes, we plan to do a minor release early in the new year (was planned for this year, but a bit rushed now). It'll also include upgrades to the seafloor gridding. Along with an associated minor release for pyGPlates containing a needed bug-fix. Ie, GPlately 2.1 and pyGPlates 1.1.

This takes 2.3 seconds to execute:

That would be due to the loading/parsing of the gpml/rot files (when rebuilding the pygplates.RotationModel and pygplates.FeatureCollection objects, on the parallel CPUs, from pickled filenames). So there's no pickling of pyGPlates objects here - only the filenames.

This takes 9.4 seconds to execute:

That would be due to pickling of the pygplates.RotationModel and pygplates.FeatureCollection objects themselves, that DataServer returns (unlike PlateModelManager which returns filenames). And when pyGPlates pickles those two types of objects it doesn't load the data from files (as you surmised) - it just serialises the internal data and transports that (and then unserialises on the other end).

One issue that seems to have come up in the 2.0 release is that pickling / unpickling objects is much slower. I think this is because the data within rotation models, static polygons, topologies, etc. is being pickled rather than just the filenames.

Yes, that was the cause of the slowness. This was fixed somewhat in #352, by essentially pickling the filenames when available. So that's in 2.0. And it's why using PlateModelManager avoids the slowness - because, unlike DataServer, it returns filenames instead of pygplates objects. So if you replace DataServer with PlateModelManager in your example above then it should run faster.

To better support DataServer I'm planning to optimise the pickling in pyGPlates, and have that in the next pyGPlates release. If I can get the pickling performance to be comparable to the actual loading of pygplates.RotationModel and pygplates.FeatureCollection from gpml/rot files (as in the 2.3 second example above) then it'll be fast enough.

If the performance isn't fast enough then I'll try some of your ideas about tracking filenames directly in the pygplates objects (so that filenames can be pickled where possible) - eg, when DataServer creates a pygplates.FeatureCollection from a filename (obtained from the PlateModelManager it uses internally) then pyGPlates can record that filename inside the pygplates.FeatureCollection, for subsequent pickling.

If we can pickle the pygplates objects themselves then that would be best since it should work in all scenarios. Because it doesn't require the original file(s) to be present. For example, maybe a pygplates.RotationModel was created from a temporary file that no longer exists, and so pickling the filename would not work. Another example is a file getting modified between when it was first loaded into a pygplates object (eg, in a GPlately PlateReconstruction) and when that object is pickled for parallel processing. You'd expect the unmodified pygplates object to be transported/pickled (because you loaded it from the unmodified rotation file). This might be where your modified=1 example comes into play, although I think you're referring to modifications to the actual pygplates objects there, rather than modifications to the files they were loaded from.

But yeah, until the next release, it's best to use PlateModelManager instead of DataServer for parallel work.

jcannon-gplates avatar Nov 25 '25 10:11 jcannon-gplates