azureml-sdk-for-r
azureml-sdk-for-r copied to clipboard
Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit
Describe the bug
Unable to download files to compute instance using download_from_file_dataset
> library(azuremlsdk)
> ws <- load_workspace_from_config()
> ojdata <- get_dataset_by_name(name = "diabetesfiles", workspace = ws)
> download_from_file_dataset(ojdata, target_path = 'sampleData', overwrite=T)
Error: C stack usage 160275245380 is too close to the limit
> download_from_file_dataset(ojdata, target_path = 'sampleData', overwrite=T)
Screenshots
Additional context Add any other context about the problem here.
I have the same error, but in this case, my datastore is an Azure SQL DB. I am creating a tabular dataset from that datastore and trying to read that into a data frame. Interestingly, this works fine on Python, and I think the R package is calling the same python code. Right now, I am using a workaround where I download the files locally and read them into R data frames.
ds <- get_dataset_by_name(ws, name="traindata-test")
ds$to_csv_files()$download("./data/traindata")
df <- read_csv("./data/traindata/part-00000")
I am trying to understand why I am running into a C stack error when I try to to the conversion to R data frame from a TabularDataset
using load_dataset_into_data_frame()
.
I have use load_dataset_into_data_frame()
where the source data is on SQL DB and is of TabularDataset
; works fine up to 10k rows and about 200 columns; i could imagine this could be problematic with larger datasets and potentially hitting C Stack errors.
I am encountering the same error when trying to load a dataset on a compute cluster.
When run on an ML compute instance via RStudio, the code below runs fine.
When executed as part of a pipeline in an RScriptStep
on ML compute cluster with the same VM sku as the RStudio compute instance, it throws the C stack error:
my_data= load_dataset_into_data_frame(my_dataset)
Also: Attempting to use $to_csv_files()$download(...)
as mentioned by @nikhilp0 is not working for me. It caused same 'C stack' error for me when run on compute instance in RStudio.
For anyone else blocked by the same issue, I was able to work around it by downloading files directly from the Datasource
and not using a Dataset
at all.
Instead of
my_dataset = get_dataset_by_name(aml_workspace, my_dataset_name)
my_data = load_dataset_into_data_frame(my_dataset) # 'C stack` error thrown here when running on compute cluster
I used
input_datastore = get_datastore(aml_workspace, "input_data")
download_from_datastore(datastore=input_datastore, "./input_data", overwrite=TRUE)
my_data = read.csv("./input_data/my_data.csv")
Also seeing the same error when attempting to use get_model()
method.
I'm now seeing the same error after deploying previously working R script and RScriptStep
into a new workspace. Same code and cluster VM SKU, but in new workspace consistently throws 'C stack' error.
Blocked by same issue, when mounting a file dataset.
Blocked by same issue, when mounting a file dataset. I noticed print out the Cstack info and it seems ok. I am unable to download from datastore (workaround above) or dataset at this point is there another workaround, this is blocking a client CI/CD pipeline.
print(Cstack_info()) download_from_datastore(datastore='x', path='y', prefix='z', overwrite=TRUE)
output:
size current direction eval_depth
7969177 88448 1 11
Error: C stack usage 870311906868 is too close to the limit
Execution halted
I get also the same C stack error when trying to get some data from the workspace Datastore using a simple sql query:
qry_str <- "SELECT * FROM ws_sql_view" dataset_obj <- ws %>% get_datastore("isf_db") %>% reticulate::tuple(qry_str) %>% python_sdk$core$dataset$Dataset$Tabular$from_sql_query()
The code fails in the last line where we directly use python module in R to get the dataset object (i.e, python_sdk$core$dataset$Dataset$Tabular$from_sql_query()
):
Error: C stack usage 403877116004 is too close to the limit
Execution halted
Any news for a possible solution for this issue?
I'm now seeing the same error after deploying previously working R script and
RScriptStep
into a new workspace. Same code and cluster VM SKU, but in new workspace consistently throws 'C stack' error.
For anyone else blocked by this issue, I discovered that the difference between the two workspaces mentioned above was that one used a datasource that accessed blob storage via an account key and the other used a datasource that accessed blob storage with a SAS token.
Attempting to use a datasource with a SAS key from R SDK triggered the 'C stack' error. Using the datasource with an account key did not.
Tabular datasets support in the R SDK and RScriptStep are experimental and we will not be triaging issues for them at this time - please do not take a dependency on them.
We willfollow up on the file datasets issue with recommended approach.