probability [Feature request] Numerical/memory efficient MC integration

As already discussed (with @dustinvtran) a while ago: I guess there aren't currently any plans for either

numerical integration methods (Simpson, quadrature...)
("memory efficient") (serialized) MC integration methods

If not, is it okay to create them and add to TFP?

Our use-case: we are currently building a fitting library for high-energy physics on top of TensorFlow (zfit), still in it's early stage. We need a lot of normalization integrals in there.

on the mc integration: the tfp.mc.expectation is already a nice function, but we would like to have a function that:

can work serialized (so time vs. memory, being able to average over "infinitely" many samples) (2. probably already handles the scaling right (so "integration" instead of "expectation value")) But 2. has pretty low priority, 1. would be a need though.

Oct 08 '18 20:10 jonas-eschle

+1e6 (or any large number). I think we would welcome numerical / mcmc integration methods. This has been useful in some of my work (I've written a few hacky implementations for my own code), and I imagine plenty of others would find this useful.

Note that the VectorDiffeomixture code: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/python/distributions/vector_diffeomixture.py, uses Gauss-Hermite quadrature under the hood, so imagine some of that code could be refactored out and repurposed as well.

Are you also interested in QMC integration methods? Halton Sequences are supported in TFP, but I imagine we could add more quasi-monte carlo sequences (Sobol' for instance).

I guess of serialization, you want a "mini-batch" (to use/abuse ML terminology) version of the Expectation code? I don't forsee that being too difficult given that there are nice formulas for online updating of averages / variances.

On Mon, Oct 8, 2018 at 1:27 PM Jonas Eschle [email protected] wrote:

As already discussed (with @dustinvtran https://github.com/dustinvtran) a while ago: I guess there aren't currently any plans for either

numerical integration methods (Simpson, quadrature...)

("memory efficient") (serialized) MC integration methods

If not, is it okay to create them and add to TFP?

Our use-case: we are currently building a fitting library for high-energy physics on top of TensorFlow (zfit https://github.com/zfit/zfit), still in it's early stage. We need a lot of normalization integrals in there.

on the mc integration: the tfp.mc.expectation is already a nice function, but we would like to have a function that:

can work serialized (so time vs. memory, being able to average over "infinitely" many samples) (2. probably already handles the scaling right (so "integration" instead of "expectation value")) But 2. has pretty low priority, 1. would be a need though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/181, or mute the thread https://github.com/notifications/unsubscribe-auth/ABABB1YUd_s9IaamxlYvitynpa3OyHVgks5ui7U7gaJpZM4XNnzh .

Oct 08 '18 20:10 srvasude

That sounds good!

There are a lot of possibilities around for the numerical integration, Gauss-Hermite sounds good though (AFAIK) rather good at infinite integrals, so I'd propose an additional, more basic, say Simpson rule based one. Nice would be an adaptive algorithm, though I'm unsure about an efficient way to implement it in TF. While_loop?

Of course, QMC is nice and we already use the implemented Halton-Sequence. As we use only few dimensions to integrate over (up to 6 typically), Halton seems to be the preferred one to use (for us).

Yes, mini-batches. "Basically" a

tf.mean(expectation(batch) for batch in batches)

but with serialized execution inside.

What do you think about it?

Oct 09 '18 10:10 jonas-eschle

FYI,

Internally, we are currently in the process of writing the basic numerical integration methods (quadrature, in particular).

Oct 10 '18 20:10 cyrilchim

Perfect, that would fit our needs and is a good start for (possibly) further algorithms.

Internally: TFP, but private development? Any hints on the time-schedule?

We're also happy to comment on the function as soon as a PR is made.

Oct 11 '18 07:10 jonas-eschle

Note that you can use the tf.metrics.mean in conjunction with some variable set up to get what you need. Something like this for example:

batch_size = 100000
indices = np.arange(batch_size, dtype=np.int32)
seq_indices = tf.get_variable('halton_index',
                              shape=[batch_size],
                              dtype=tf.int32,
                              initializer=tf.constant_initializer(indices))

increment_op = seq_indices.assign_add(tf.fill([batch_size], batch_size))

dim = 5
my_func = lambda x: tf.reduce_sum(x * x, axis=1)

halton_sample = tfp.mcmc.sample_halton_sequence(dim,
                                                sequence_indices=seq_indices,
                                                dtype=tf.float64,
                                                randomized=False)

batch_values = my_func(halton_sample)
mean, mean_update_op = tf.metrics.mean(batch_values)

global_init = tf.global_variables_initializer()
local_init = tf.local_variables_initializer()

with tf.Session() as sess:
  sess.run((global_init, local_init))
  for i in range(10):
    current_mean = sess.run(mean_update_op)
    print ("%d: %f" % (i, current_mean))
    sess.run(increment_op)

As Cyril mentioned, we (i.e. Google TFP team) is working on adding some quadrature methods. We will start with something simple like Simpson's rule and then move on to more complex ones. Being able to support adaptive methods is definitely one of the design goals.

A first cut at the Simpson's rule should be available in a few weeks. If you have suggestions for prioritization we would be happy to take them into account if possible.

On a side note, are you using the randomized Halton samples or non-randomized ones? The scheme for sequential updating above won't work with the current implementation of the randomized samples (or more precisely, it will appear to work but the results won't be consistent with Owen's randomization method).

Oct 11 '18 12:10 saxena-ashish-g

Thanks for the nice example, I did not think about that. I managed finally to get something with tf.control_dependencies working (without the use of a sess.run). If a batched expectation value function would be welcome here, I'll implement one and we can see whether it fits the needs and scope of tfp, alright?

Perfect! For priorities: a simple quadrature is absolutely fine, rather time is a factor. So looking forward to it! (and feel free to tag if a PR is ready and another opinion is welcome).

We use the randomized ones, but generating a new sample each time. So rather:

for loop:
  samples = halton_sample(...)
  averages.append(tf.reduce_mean(func(samples)))
# only here sess.run
sess.run(tf.reduce_sum(averages))

This should work I guess?

Oct 11 '18 21:10 jonas-eschle

Thanks for the nice example, I did not think about that. I managed finally to get something with tf.control_dependencies working (without the use of a sess.run).

I am not entirely sure how/why you needed the control_dependencies. If you prefer to do it without the need for session.run then a while loop can do the job here. Here is the same example as above but without needing the update ops to drive the evaluations.

batch_size = 10000
num_batches = 100
dim = 5
dtype = tf.float64

my_func = lambda x: tf.reduce_sum(x * x, axis=1)

def body(batch_num, mean):
  start_idx = batch_num * batch_size
  end_idx = start_idx + batch_size
  indices = tf.range(start_idx, end_idx, dtype=tf.int32)
  halton_sample = tfp.mcmc.sample_halton_sequence(dim,
                                                  sequence_indices=indices,
                                                  dtype=dtype,
                                                  randomized=False)
  func_vals = my_func(halton_sample)
  batch_mean = tf.reduce_mean(func_vals)
  err_weight = tf.to_double(batch_num)
  err_weight /= err_weight + 1
  return batch_num + 1, mean + err_weight * (batch_mean - mean)

cond = lambda batch_num, _: batch_num < num_batches

initial_mean = tf.convert_to_tensor(0, dtype=dtype)
_, final_mean = tf.while_loop(cond, body, (0, initial_mean))

We use the randomized ones, but generating a new sample each time. So rather:
for loop:
  samples = halton_sample(...)
  averages.append(tf.reduce_mean(func(samples)))
# only here sess.run
sess.run(tf.reduce_sum(averages))

This is not exactly how the randomized samples are meant to be used. In a sense, the randomization of a low discrepancy sequence is "minimal". It has to be so that it preserves the low discrepancy nature of the original sequence. For example, Owen's randomization (the one implemented in TFP) effectively gives you the same degree of randomness as one single uniform random variable. The loop in your example above, is essentially using the same sample in every iteration with just this one random number changing. A good way to check if you are making proper use of the low discrepancy samples is to plot the error of the integral with respect to the number of samples (assuming the true answer is known) and ensure that it is roughly 1 / N^2. I suspect you will find that the error is roughly constant or very slowly decreasing with your strategy.

As I mentioned in the previous post, the randomization as it is implemented at the moment, is not going to be consistent from one call to the next (i.e. randomized halton from [0 .. N + M] != randomized halton [0...N] + randomized halton [N+1 ... M]). This is something we plan to fix in the short order after which you should be able to do either the while loop version or the variables version with randomization on.

Oct 12 '18 10:10 saxena-ashish-g

Thanks for the example, this works nicely. It did not work for me once (huge memory consumption), probably a fix in between.

Ah, got it! Yes, that would be useful. But currently, we do not strongly depend on the randomization, so that's fine, but thanks for pointing it out!

Keep me up to date with the numerical integration and feel free to ask for any help needed

Oct 15 '18 16:10 jonas-eschle

@saxena-ashish-g I was just wondering, what is the current status of this? Anything already implemented?

Nov 30 '18 17:11 jonas-eschle

@cyrilchim are there any news on the numerical integration methods?

Jan 09 '19 22:01 jonas-eschle

@mayou36 Sorry for the delay. We have a pending change for serialized Clenshaw-Curtis and Simpson methods. Will update the thread once the change is submitted.

Jan 10 '19 17:01 cyrilchim

@cyrilchim just re-asking about the status of this?

Aug 14 '19 19:08 jonas-eschle

Hey, I am not working on the TFP now.

Anyone?

Aug 16 '19 15:08 cyrilchim

Hi all, any updates or WIP code laying around somewhere? @srvasude maybe?

Sep 22 '19 11:09 jonas-eschle

Any updates here?

I'm also in need of this, calculating the entropy of a Gaussian mixture model and a logistic mixture model.

Dec 05 '19 15:12 EvenStrangest

There is now support for sobol sequences in https://www.tensorflow.org/api_docs/python/tf/math/sobol_sample

Feb 03 '20 20:02 brianwa84

Great, that already helps, thanks a lot!

What is it about numerical integration, quadratures etc? Do you know of planes? I've seen some code laying around

Feb 04 '20 10:02 jonas-eschle

Are there further updates here regarding numerical integration methods? We will have someone working on this and are glad to contribute it, but really want to avoid any work in case this has already been done. @brianwa84 do you know the status of this?

Feb 26 '20 15:02 jonas-eschle

I don't think we're currently investing many cycles in numerical integration, so if you want to contribute something, it won't be duplicated work. We are looking at "memory efficient MCMC" i.e. tweaking the MCMC API to support streaming reductions like mean/expectation or covariance.

Feb 26 '20 15:02 brianwa84

Okey, good to hear. I'll keep you up-to-date then here

Feb 26 '20 15:02 jonas-eschle

What would you expect as an API for an integration function?

integrate(func: Callable, {precision, other options}) for 1 dim?

May 07 '20 19:05 jonas-eschle

There's a JAX implementation of tfp.mcmc.sample_halton_sequence: tfp.substrates.jax.mcmc.sample_halton_sequence.

Is there an analogous JAX implementation of tf.math.sobol_sample? I don't see one under tfp.substrates.jax.math.

Apr 09 '23 22:04 carlosgmartin

probability probability copied to clipboard

[Feature request] Numerical/memory efficient MC integration

probability
probability copied to clipboard