azureml-sdk-for-r
azureml-sdk-for-r copied to clipboard
MemoryError when loading large number of datasets
Describe the bug My training file is loading multiple datasets into memory using "create_tabular_dataset_from_parquet_files" function then using rm(dataset);gc(); to release the dataset. Error thrown even after datasets are removed and garbage collection invoked.
driver_log: Error in py_call_impl(callable, dots$args, dots$keywords) : MemoryError: Engine process terminated. This is most likely due to system running out of memory. Please retry with increased memory. |session_id=ba9874d8-f32a-4568-b75c-650f6747ef4e
Detailed traceback: File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/_loggerfactory.py", line 126, in wrapper return func(*args, **kwargs) File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/dataset_factory.py", line 122, in from_parquet_files partition_format) File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/dataset_factory.py", line 134, in _from_parquet_files validate or _is_inference_required(set_column_types)) File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/dataset_factory.py", line 768, in _transform_and_validate 'Make sure the path is accessible and contains Calls: source ... create_tabular_dataset_from_parquet_files -> <Anonymous> -> py_call_impl
Expected behavior rm(); gc(); should release memory associated with in-memory dataset such that memoryerror does not occur.