arcgis-python-api icon indicating copy to clipboard operation
arcgis-python-api copied to clipboard

spatial.to_featureclass is very slow in writing to disk

Open DavidAnderson-USFS opened this issue 3 years ago • 8 comments

Describe the bug I am exporting a SEDF to local featureclass. I have tried both a mobile geodatabase and a file geodatabase. The SEDF is roughly 2 million rows. The save is taking many hours to do,on the order of 6 to7 hours. For reference just saving the dataframe as CSV takes a few minutes and about 5 or 6minutes if the SHAPE field exports as WKT.

To Reproduce Steps to reproduce the behavior:

veg_eru_df[['id2','stm_state','R3ERUCODE','Lifeform','Dominance_type','Quad_mean_diam_tr_gr_1','Tree_canopy_cov','Canopy_layering','Tree_dominance','Shrub_cover','Shrub_dominance','Herb_cover','Herb_dominance','is_invasive','SHAPE']].spatial.to_featureclass(fc_name,overwrite=True,sanitize_columns=False)

error:

takes a very long time, on the order of many hours to perform this command on a dataframe with a million plus rows.

Screenshots If applicable, add screenshots to help explain your problem.

Expected behavior A process that takes a a minutes.

Platform (please complete the following information):

  • OS: Windows 10
  • Browser Chrome
  • Python API Version 2.0.0

Additional context Add any other context about the problem here, attachments etc.

DavidAnderson-USFS avatar Feb 24 '22 22:02 DavidAnderson-USFS

@DavidAnderson-USFS can you provide more information? Type of geometry, reproducible code, etc..?

achapkowski avatar Feb 25 '22 12:02 achapkowski

The dataframe contains polygon features. It was created thusly veg_by_eru = r"memory\veg_by_eru" arcpy.analysis.Intersect([veg_layer,eru_layer],veg_by_eru) veg_eru_df = pd.DataFrame.spatial.from_featureclass(veg_by_eru) There is a bit of manipulaion of the dataframe,then exported saved off using the code as shown above.
A straightforward process.

This warning message does pop up ` _C:\Users\davidanderson\AppData\Local\ESRI\conda\envs\arcgispro-py3_api20\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.setitem_single_column(loc, value, pi) `

I did try doing the export on the entire data frame. The save time did not noticeably change.

DavidAnderson-USFS avatar Feb 25 '22 15:02 DavidAnderson-USFS

I believe I have found the problem. I hunch it might be an index issue. I got to looking at the triggers on the tables created by the to_featureclass function. This is the trigger causing the problem

CREATE TRIGGER st_insert_trigger_model_area_ChihuahuanDeserts_Shape AFTER INSERT ON model_area_ChihuahuanDeserts FOR EACH ROW BEGIN SELECT InsertIndexEntry('st_spindex__model_area_ChihuahuanDeserts_Shape', NEW.Shape, NEW._ROWID_, 2); END;

It it firing for every row inserted. Rebuilding the spatial index hundreds of thousands to millions of times is going to have a performance impact. For doing bulk attributes insertions the best practice is to turn the index off until all the data is added, then build the index just once on new data. I suggest a similar approach. Insert the data, build the spatial index, then add the trigger.

DavidAnderson-USFS avatar Mar 10 '22 22:03 DavidAnderson-USFS

So this is happening on an Enterprise database?

achapkowski avatar Mar 11 '22 10:03 achapkowski

No. I am using the new mobile geodatabase format. So, a SQLite database. I'd assume the same database structure, triggers and all, is being used in a enterprise database. I can't verify that as I don't have DBA level access into my organizations enterprise GIS databases

DavidAnderson-USFS avatar Mar 11 '22 15:03 DavidAnderson-USFS

You would probably experience the same issue as a geopackage as well, there is a similar trigger.

achapkowski avatar Apr 08 '22 12:04 achapkowski

Just wondering if there was any movement on this issue. I am doing a similar process using the 1.9.1 version. Same issue, still very slow write to mobile geodatabase.
I dug into the code a bit. The fileops.py file indicates that the entire to_featureclass is a wrapper around the arcpy.da.InsertCursor. That is a row by row processing. Slow. Perhaps the Pandas SQLite to_sql functionality could be used for mobile geodatabases for quicker writes? Or the arcpy.da.NumPyArrayToFeatureClass functionality. That went quite quick as I recall.

DavidAnderson-USFS avatar Aug 03 '22 20:08 DavidAnderson-USFS

@DavidAnderson-USFS we are looking at this currently but we have no updates at the moment.

achapkowski avatar Aug 04 '22 09:08 achapkowski

If you can, I would Try using arcpy....Turn off your spatial index:

arcpy.management.RemoveSpatialIndex(fc_path)
do something. (with arcpy.da.Editor(workspace) or any other cursor)
arcpy.management.AddSpatialIndex(fc_path)

One my clients has a script, car telemetry 2 millions or more points every week, that approach it's working well. The scripts does many things, but overwriting a new featureclass is quite fast.

hildermesmedeiros avatar Nov 07 '22 17:11 hildermesmedeiros

@HIldermesmederiros Thanks for the suggestion. It is not directly applicable to the use case in this ticket. Your example refers to an already existing feature class stored on disk somewhere. The case here is for creating a new feature class.

DavidAnderson-USFS avatar Jan 19 '23 19:01 DavidAnderson-USFS

@DavidAnderson-USFS the arcpy team did some work with this, and we have made some minor tweaks. We the moment, we can't take it any further. Please update to 2.2.0 when it's released.

achapkowski avatar Jun 22 '23 12:06 achapkowski