Bug in tf_agents.bandits.policies.linalg.conjugate_gradient?

Open td20002 opened this issue 2 years ago • 0 comments

Hello,

I tried using the conjugate_gradient in tf_agents.bandits.policies.linalg with different batch_size but with the same example (b_mat is batch_size columns of the same example) and for each batch_size, conjugate_gradient returns a different result. This is incorrect since columns in b_mat are the same, so the result matrix should be the same. This affects the predicted rewards in _predict_mean_reward_and_variance.

Then I tried replacing this conjugate_gradient implementation with tf.matmul(tf.linalg.inv(a_mat), b_mat) and got the correct result (the result matrix is the same).

Can you check if this is a bug? If yes, why you guys didn't simply use tf.matmul(tf.linalg.inv(a_mat), b_mat) and reinvented the wheel here?

Thanks.

Jul 27 '23 01:07 td20002