trackintel
trackintel copied to clipboard
Read_from_postgis functions fail for chunksize!=None
The error can be reproduced by setting the chunksize argument in any of the test_read
tests in test_postgis
e.g., here
The problem seems to be that gpd.GeoDataFrame.from_postgis
returns a generator instead of a geodataframe. In the documentation of gpd.GeoDataFrame.from_postgis
it says that one should use gpd.read_postgis
maybe this already fixes the problem.
gdf = <generator object _read_postgis.<locals>.<genexpr> at 0x00000210EBC83EB0>
set_names = {'finished_at': 'finished_at', 'started_at': 'started_at', 'user_id': 'user_id'}
geom_col = None, crs = None, tz_cols = ['started_at', 'finished_at'], tz = None
def _trackintel_model(gdf, set_names=None, geom_col=None, crs=None, tz_cols=None, tz=None):
"""Help function to assure the trackintel model on a GeoDataFrame.
Parameters
----------
gdf : GeoDataFrame
Input GeoDataFrame
set_names : dict, optional
Renaming dictionary for the columns of the GeoDataFrame.
set_geometry : str, optional
Set geometry of GeoDataFrame.
crs : pyproj.crs or str, optional
Set coordinate reference system. The value can be anything accepted
by pyproj.CRS.from_user_input(), such as an authority string
(eg "EPSG:4326") or a WKT string.
tz_cols : list, optional
List of timezone aware datetime columns.
tz : str, optional
pytz compatible timezone string. If None UTC will be assumed
Returns
-------
gdf : GeoDataFrame
The input GeoDataFrame transformed to match the trackintel format.
"""
if set_names is not None:
> gdf = gdf.rename(columns=set_names)
E AttributeError: 'generator' object has no attribute 'rename'
trackintel\io\from_geopandas.py:399: AttributeError
Uff :D To be honest I am not quite sure if we can fix this one. Like most of our function depend on that we have the whole dataset in memory for groupby, sorting and such. Unless we consume the iterator into a big dataframe this problem consists but then the chunksize parameter is not that useful.
What would be your usecase?
I would rather add a more useful error message.
Hm... I see your point. So I am currently working with a large dataset that barely fits into my memory. There seems to be some overhead related to reading/writing to/from postgis which is enough to increase the memory consumption so that reading/writing operations fail in this case. This overhead seems to be lower if chunksize!=None
meaning that I can send/read the data without that it fails.
I am not sure how we would change it to be honest so it might be best to simply add a better error message or a check that says that the chunksize argument is not supported at the moment. At least until we have a proper big data strategy for trackintel ;-)