theft icon indicating copy to clipboard operation
theft copied to clipboard

memory leak when when calling calculate_features?

Open windwine opened this issue 2 years ago • 4 comments

Thank you for the easy to use package. I guess there might be some memory leakage when calling calculate_features. I have a for loop iterating ~2000 times each calling all_features <- calculate_features ( data = tempdaily , id_var = "ID" , time_var = "nday" , values_var = "log_val" , feature_set = c( "catch22" ) )

basically I am extracting the ts features in a rolling window. The tempdaily data frame is ~ 3000 ID * 250 observations so the resulting all_features data frame is fairly small. I am seeing the memory usage in Rstudio and windows resource monitor grow gradually and eventually hit more than 100GB (I have a 128GB RAM machine) to crash the rsession. I looked at all the object.size in my env and they are no more than 4GB in total so I guess the extra RAM usage was coming from the memory leakage in the calculate_features function? I have also tried to add gc() at the end of each loop and it did not help. Can you check on that? Thanks.

windwine avatar May 10 '23 19:05 windwine

Hi @windwine what version of Rcatch22 are you using? There was a noted memory leak in a previous version.

hendersontrent avatar May 11 '23 01:05 hendersontrent

Got it and I am using Rcatch22_0.2.1. Maybe I should go with the dev version? Thank you so much for your efforta in getting us such an easy to use package.

windwine avatar May 11 '23 13:05 windwine

Hmm that’s the correct version. calculate_features uses dplyr verbs so I don’t believe it’s the cause of the memory leak which was why I thought it might be Rcatch22. What does the code for your for loop look like (if you are able to share)?

hendersontrent avatar May 11 '23 13:05 hendersontrent

Sure. I have also tried to directly modify your feat_cal function to call Rcatch22 and I was seeing the same leakage. On another trial when using only "feast" I did not observe memory leakage. I guess it is coming from Rcatch22 and I will also look at the functions in the Rcatch22 package to see if I am spot something.

call the feature_cal function

for (i in (start):(obs)) # { #select the stocks has reported price and a history>260 days on this rebalancing day library(tidyverse) library(PerformanceAnalytics) library(data.table) library(tsfeatures) library(theft)

####### some data filter operation here to get the tempdaily data

tempdaily<-tempdaily %>%
  group_by(ID) %>%
  arrange(Date) %>%
  mutate(C=cumprod(1+ret),log_C=log(C),nday=row_number()) %>%
  ungroup() %>%
  select(ID,Date,log_C,nday)


all_features <- calculate_features (
  data = tempdaily ,
  id_var = "ID" ,
  time_var = "nday" ,
  values_var = "log_C" ,
  # feature_set = c( "feasts" )
  feature_set = c( "catch22" )
)

feats=all_features[[1]]
feats<-feats %>%
  rename(ID=id) %>%
  select(-method)

feats_w=feats %>%
  pivot_wider(names_from = names,values_from = values, values_fn = mean)

feats_w<-feats_w %>%
  mutate(Date=actiondates[i-1])


final[[i]]=feats_w

rm(all_features)
rm(feats)
rm(feats_w)
rm(tempdaily)
rm(tempdata)
gc()

}

windwine avatar May 11 '23 13:05 windwine