webppl icon indicating copy to clipboard operation
webppl copied to clipboard

valueRec and serialize are slow for large, recursive objects

Open dritchie opened this issue 8 years ago • 1 comments

The MAP aggregator calls ad.valueRec on all items that it adds, and the Histogram aggregator calls both ad.valueRec and util.serialize. These transformations make inference unusably slow when the program returns large objects (e.g. procedurally-generated geometric meshes). Addressing this requires general solutions to these two subproblems:

  1. Avoid ad.valueRec when the inference routine doesn't use AD. Or, perhaps even if it does, skipping this step at the user's request? For instance, if the user only wants the MAP, then it's only necessary to perform this transform on the final MAP value, not on every candidate value along the way.
  2. Find a better way to index histograms. I've long felt uncomfortable with the use of potentially giant, serialized strings as histogram keys, and it turns out they can be a big issue. I.e. a particle filter can run very quickly, but if it takes ~1 sec to serialize each final particle value, then it becomes unusably slow.

dritchie avatar Apr 06 '16 23:04 dritchie

I tested valueRec and serialize on a large object and the times taken for each were ~200ms and ~600ms respectively.

valueRec

Given that it's so slow, your suggestion of calling valueRec only when we know we're using ad seems sensible to me. I'll work up a PR for this.

I suppose another approach would be to walk each object to check whether it contains any ad nodes before we commit to copying the whole object. I'd expect this to be fast, but it's probably messy enough that we're better off doing it your way.

When we are using ad, arranging things so that valueRec is only called once when onlyMAP is used seems reasonable. (Assuming we think this will bite someone at some point.) Perhaps we just pass a thunk (which wraps the valueRec call) to the MAP aggregator, rather than the value itself.

serialize

It appears that serialize became especially slow when correct handling of ±Infinity was added. (#162, #256). Without that, the time taken to serialize the same object comes down to ~60ms.

I'm not sure how best to deal with this. One option is just to remove the Infinity handling, possibly bringing it back when we're using something better than stringify + plain objects to build histograms. (With the expectation that users will encode Infinity themselves, should their program return it.) Does anyone have any thoughts about that?

Related issues #179, #108.

null-a avatar Apr 07 '16 14:04 null-a