vcr icon indicating copy to clipboard operation
vcr copied to clipboard

Slow loading of test fixtures with large bodies

Open amoeba opened this issue 7 years ago • 3 comments

This issue is related to #51. Over there, @sckott patched issues with vcr and dependent packages relating to large response bodies. After those fixes, I noticed that my fixtures appear to take 3-5 seconds to load which seems slow. I turned off my network connection and ran the tests to verify the tests weren't just making a network request connection and the tests run and still take just as long without a network connection.

This error can be reproduced by checking out https://github.com/amoeba/dataonecrul/commit/05f52cf0e8be84af9e82f264115c1a2c9797f15a and running devtools::test().

Here's what I see in my Test pane:

==> devtools::test()

Loading dataonecrul
Loading required package: testthat
Loading required namespace: jsonlite
Testing dataonecrul
✔ | OK F W S | Context
✔ |  2       | getObject [3.3 s]
✔ |  1       | SystemMetadata

══ Results ════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 3.4 s

OK:       3
Failed:   0
Warnings: 0
Skipped:  0

The fixture for getObject is ~ 10MiB in size. I simple yaml::yaml.load_file("tests/fixtures/vcr_cassettes/get_system_metadata.yml") is nearly instantaneous so maybe it's not directly a YAML parsing issue.

Session Info
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.447)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Juneau              
 date     2018-04-30                  

Packages --------------------------------------------------------------------------------------------------
 package    * version    date       source                            
 base       * 3.5.0      2018-04-24 local                             
 base64enc    0.1-3      2015-07-28 CRAN (R 3.5.0)                    
 compiler     3.5.0      2018-04-24 local                             
 curl         3.2        2018-03-28 CRAN (R 3.5.0)                    
 datasets   * 3.5.0      2018-04-24 local                             
 devtools     1.13.5     2018-02-18 CRAN (R 3.5.0)                    
 digest       0.6.15     2018-01-28 CRAN (R 3.5.0)                    
 git2r        0.21.0     2018-01-04 CRAN (R 3.5.0)                    
 graphics   * 3.5.0      2018-04-24 local                             
 grDevices  * 3.5.0      2018-04-24 local                             
 httr         1.3.1      2017-08-20 CRAN (R 3.5.0)                    
 jsonlite     1.5        2017-06-01 CRAN (R 3.5.0)                    
 knitr        1.20       2018-02-20 CRAN (R 3.5.0)                    
 lazyeval     0.2.1      2017-10-29 CRAN (R 3.5.0)                    
 magrittr     1.5        2014-11-22 CRAN (R 3.5.0)                    
 memoise      1.1.0      2017-04-21 CRAN (R 3.5.0)                    
 methods    * 3.5.0      2018-04-24 local                             
 R6           2.2.2      2017-06-17 CRAN (R 3.5.0)                    
 Rcpp         0.12.16    2018-03-13 CRAN (R 3.5.0)                    
 rstudioapi   0.7        2017-09-07 CRAN (R 3.5.0)                    
 stats      * 3.5.0      2018-04-24 local                             
 tools        3.5.0      2018-04-24 local                             
 triebeard    0.3.0      2016-08-04 cran (@0.3.0)                     
 urltools     1.7.0      2018-01-20 cran (@1.7.0)                     
 utils      * 3.5.0      2018-04-24 local                             
 vcr        * 0.0.8.9534 2018-04-30 Github (ropensci/vcr@ba32bc9)     
 webmockr   * 0.2.2.9119 2018-04-30 Github (ropensci/webmockr@34fae0c)
 withr        2.1.2      2018-03-15 CRAN (R 3.5.0)                    
 yaml         2.1.18     2018-03-08 CRAN (R 3.5.0) 

amoeba avatar May 01 '18 00:05 amoeba

thanks, been having a look. yes, yaml loading is pretty fast, but after loading there's something(s) related to size of that response loaded in - i may not get to this for the first version https://github.com/ropensci/vcr/milestone/1 - since its more a performance issue (as far as we can tell so far 😬)

sckott avatar May 01 '18 06:05 sckott

No worries on when this gets in since I'm just kicking the tires. Looking at a profr trace of devtools::test() I see that the majority of time is taken within calls to what profr labels as self$deserialized_hash which is essentially a call to self$serializer$deserialize_path() which is essentially a call to yaml::yaml.load_file(self$path) (If I'm not mistaken, at least). And yaml::load_file is fast:

> microbenchmark({
+   some_var <- yaml::yaml.load_file("tests/fixtures/vcr_cassettes/get_object.yml")
+ }, times = 10)
Unit: milliseconds
                                                                                    expr      min
 {     some_var <- yaml::yaml.load_file("tests/fixtures/vcr_cassettes/get_object.yml") } 392.7561
       lq    mean   median       uq      max neval
 395.4149 410.628 398.3413 421.9784 458.8146    10

I assume R6 is faster than this so maybe something else is up?

I feel like vcr's functions should be relatively fast. Looking lower down on the trace, I wonder if testthat's use of eval() and related functions is to blame.

amoeba avatar May 01 '18 07:05 amoeba

Note: Test this with json serializer to see if its an issue with yaml pkg itself

sckott avatar Dec 03 '20 17:12 sckott