memory leak when when calling calculate_features?
Thank you for the easy to use package. I guess there might be some memory leakage when calling calculate_features. I have a for loop iterating ~2000 times each calling all_features <- calculate_features ( data = tempdaily , id_var = "ID" , time_var = "nday" , values_var = "log_val" , feature_set = c( "catch22" ) )
basically I am extracting the ts features in a rolling window. The tempdaily data frame is ~ 3000 ID * 250 observations so the resulting all_features data frame is fairly small. I am seeing the memory usage in Rstudio and windows resource monitor grow gradually and eventually hit more than 100GB (I have a 128GB RAM machine) to crash the rsession. I looked at all the object.size in my env and they are no more than 4GB in total so I guess the extra RAM usage was coming from the memory leakage in the calculate_features function? I have also tried to add gc() at the end of each loop and it did not help. Can you check on that? Thanks.
Hi @windwine what version of Rcatch22 are you using? There was a noted memory leak in a previous version.
Got it and I am using Rcatch22_0.2.1. Maybe I should go with the dev version? Thank you so much for your efforta in getting us such an easy to use package.
Hmm that’s the correct version. calculate_features uses dplyr verbs so I don’t believe it’s the cause of the memory leak which was why I thought it might be Rcatch22. What does the code for your for loop look like (if you are able to share)?
Sure. I have also tried to directly modify your feat_cal function to call Rcatch22 and I was seeing the same leakage. On another trial when using only "feast" I did not observe memory leakage. I guess it is coming from Rcatch22 and I will also look at the functions in the Rcatch22 package to see if I am spot something.
call the feature_cal function
for (i in (start):(obs)) # { #select the stocks has reported price and a history>260 days on this rebalancing day library(tidyverse) library(PerformanceAnalytics) library(data.table) library(tsfeatures) library(theft)
####### some data filter operation here to get the tempdaily data
tempdaily<-tempdaily %>%
group_by(ID) %>%
arrange(Date) %>%
mutate(C=cumprod(1+ret),log_C=log(C),nday=row_number()) %>%
ungroup() %>%
select(ID,Date,log_C,nday)
all_features <- calculate_features (
data = tempdaily ,
id_var = "ID" ,
time_var = "nday" ,
values_var = "log_C" ,
# feature_set = c( "feasts" )
feature_set = c( "catch22" )
)
feats=all_features[[1]]
feats<-feats %>%
rename(ID=id) %>%
select(-method)
feats_w=feats %>%
pivot_wider(names_from = names,values_from = values, values_fn = mean)
feats_w<-feats_w %>%
mutate(Date=actiondates[i-1])
final[[i]]=feats_w
rm(all_features)
rm(feats)
rm(feats_w)
rm(tempdaily)
rm(tempdata)
gc()
}