qcheck icon indicating copy to clipboard operation
qcheck copied to clipboard

memory usage problems

Open Robotechnic opened this issue 1 year ago • 7 comments

Hello, I started using qcheck for a project and it's a great quick checking framework but I have a problem with the memory consumption on some tests which require to manipulate huge data structures or a lot of data:

open QCheck2

let generator = Gen.(
  (0 -- 500) >>= fun n ->
  tup2 (return n) (list_size (return n) (array_size (1 -- 200) int))
)

let test = Test.make ~count:1000 ~name:"test" generator (fun _ ->
  true)

let () = exit @@ QCheck_runner.run_tests [test;test;test]

This is a minimal example and my problem is that it uses 1.1gb of RAM if I run it like that:

let () = ignore @@ QCheck_runner.run_tests [test]
let () = ignore @@ QCheck_runner.run_tests [test]
let () = exit @@ QCheck_runner.run_tests [test]

It takes the same amount of time to run but uses 438Mb of RAM, which is what I would expect from a test. Here the example is dumb but in my case, I'm running 500 tests that run fine because they don't use much RAM but I have 30 tests that uses a lot of RAM and running them all at once consumes more than 10gb of RAM.

Robotechnic avatar Jun 21 '24 08:06 Robotechnic

I forgot to gave some precisions about my config: It's done using ocaml 4.14.2 and check 0.21.3

Robotechnic avatar Jun 21 '24 18:06 Robotechnic

A quick calculation shows that in the worst case, each array could be 500 * 200 * 8B = 800kB of data. On average it might be 250 * 100 * 8B = 200kB. If you run 1000 tests, that means you might consume 200kB * 1000 = 200MB, plus some other overhead from qcheck itself. That's not insignificant, although I'm not sure how it gets to 1.1GB even being allocated (perhaps a lot of closures and temporary values used to produce the values?).

I don't think it's super shocking, 500*200 is fairly big for nested arrays :)

Maybe run Gc.full_major() at the beginning of some tests? Or with a low random probability?

c-cube avatar Jun 21 '24 18:06 c-cube

Just passing by and interested by this. Here are two ideas.

  • Maybe try with QCheck (instead of QCheck2)
  • Maybe you can try writing a custom array generator that recycles the arrays the next time an array of the same size is drawn. In this case you would save the allocation time and only have to pay for the initialization. Of course, this is only useful when you generate enough arrays that collisions are likely.

It would look roughly like that : (I'm unsure how to do the same with QCheck2 since the Gen.t type is not public anymore)

open QCheck

let array_memo_size =
  let cache = Hashtbl.create 1000 in
  fun size gen (st:Random.State.t) ->
  let actual_size = size st in
  match Hashtbl.find_opt cache actual_size with
  | None ->
     let a = Array.init actual_size (fun _ -> gen st) in
     Hashtbl.add cache actual_size a;
     a
  | Some a -> Array.map_inplace (fun _ -> gen st) a; a

let generator = Gen.(
  (0 -- 500) >>= fun n ->
  tup2 (return n) (list_size (return n) (array_memo_size (1 -- 200) int))
)

ghilesZ avatar Jun 21 '24 19:06 ghilesZ

My problem is not really the memory usage of a 1000 tests this in itself is not a problem. My real problem is the fact that if I pass to the QCheck runner a list of tests it uses a lot more memory than if I split the tests and run them separately: If I run the tests like this:

let () = exit @@ QCheck_runner.run_tests [test;test;test]

Here is the output of the time command:

time ./_build/default/bin/main.exe -v
random seed: 454529569
================================================================================
success (ran 3 tests)
./_build/default/bin/main.exe -v   106,41s  user 0,43s system 99% cpu 1:47,35 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                1151 MB
page faults from disk:     0
other page faults:         19732

And if I run the tests like this:

let () = ignore @@ QCheck_runner.run_tests [test]
let () = ignore @@ QCheck_runner.run_tests [test]
let () = exit @@ QCheck_runner.run_tests [test]

Here is the output of the time command:

time ./_build/default/bin/main.exe -v
random seed: 115565881
================================================================================
success (ran 1 tests)
================================================================================
success (ran 1 tests)
================================================================================
success (ran 1 tests)
./_build/default/bin/main.exe -v   108,05s  user 0,47s system 99% cpu 1:48,92 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                438 MB
page faults from disk:     0
other page faults:         26705

I think that it is strange because running that shouldn't change how the program behaves from what I understood about QCheck_runner.

Memoizing the array to reuse them is a good idea, but it might not solve my problem completely as it can't be applied to most of my tests.

Robotechnic avatar Jun 21 '24 21:06 Robotechnic

Another test, that shows that the memory consumption grows linearly with the value of the count argument passed to run_tests :

open QCheck2

let count = int_of_string Sys.argv.(1)
let generator = Gen.(array_size (return 2000) int)
let test = Test.make ~count ~name:"test" generator (fun _ -> true)
let () = exit @@ QCheck_runner.run_tests [ test ]
$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 10000
random seed: 124874579
================================================================================
success (ran 1 tests)
max memory: 300392 kB

$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 20000
random seed: 390831488
================================================================================
success (ran 1 tests)
max memory: 583168 kB

$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 30000
random seed: 69779182
================================================================================
success (ran 1 tests)
max memory: 881676 kB

$ /usr/bin/time --format="max memory: %M kB" _build/default/test.exe 40000
random seed: 155692075
================================================================================
success (ran 1 tests)
max memory: 1159708 kB

Is there a memory leak somewhere in run_tests?

esope avatar Jun 24 '24 09:06 esope

Thank you both for the reports - both with nice reproducible examples! :pray: Based on them I naturally got nerd-sniped and offer a fix in #288. :nerd_face:

jmid avatar Jun 28 '24 10:06 jmid

Excellent! Thank you so much @jmid !

esope avatar Jun 28 '24 12:06 esope

Fixed in #288 which was released with version 0.22

jmid avatar Sep 05 '24 07:09 jmid