QuantEcon.py DiscreteDP: Issues

Here's a list of issues for discussion regarding the DiscreteDP class.

This is the major issue: There is an intrinsic limitation of the current approach of DiscreteDP, that one has to prepare arrays R and Q (and s_indices and a_indices with the "state-action pairs formulation") in advance. If the problem is large, the memory can become full upon creating those arrays (before passing them to DiscreteDP). For example, I tried to setup the problem given in Comparison-Programming-Languages-Economics, where there are 17,820 x 5 = 89,100 states and 17,820 actions, so there are 89,100 x 17,820 ~ 1.6 x 10^9 state-action pairs. But I haven't been able to create s_indices (size 1.6 x 10^9), a_indices (size 1.6 x 10^9), R (size1.6 x 10^9), and Q (3 nonzero entries for each of 1.6 x 10^9 pairs) without having the memory full.

One (and only?) resolution would be to have DiscreteDP accept functions (callables) for R and Q.
With the "state-action pairs formulation", DiscreteDP should work without the s_indices and a_indices arrays when all the actions are available at all states (otherwise they should be necessary).
Would we want a finite horizon algorithm?
- The current implementation does not accept beta=1 because of the convergence/zero-division issues.
EDIT: Done by #244.
Are there other important informations to store in DPSolveResult?
The method operator_iteration may be replaced with compute_fixed_point, whereas the latter does not exactly match here. See also #40 and #41.
Inputs checking is not exhaustive.
- It is not checked that the entries of Q are nonnegative and sum to one for each state-action pair.
- For the "state-action pairs formulation", it is not checked that there are no duplicate action indices for the same state. (For this, it may be better to write custom code to sort s_indices and a_indices (when the inputs are not sorted), which currently is delegated to spcipy.sparse.csr_matrix.)
If we check everything, maybe better to supply an option to skip the checks for the case when the user is sure that the inputs are correct and creates an instance many times in a loop? Or, maybe should collect the checking procedures in a method and assume that the user calls it manually?
__repr__ and __str__ are missing. What do we want to print?

Sep 12 '15 14:09 oyamad

Regarding the possibility of passing callables to the DiscreteDP class, it seems that jitted functions can call jitted functions, which should be all that's necessary.

Here's an example where this happens from the Numba documentation:

(create_fractal calls madel)

It's an interesting issue. When the problem has small or medium size it should be faster to compute all the arrays and store them in memory. So for the problems I've worked on the code seems very quick. But for large problems we have a memory issue.

@mmcky When that server at SMU is up and running how much RAM will we have? It might be fun to run this same problem @oyamad references without changing the code to see how fast it runs.

Sep 13 '15 16:09 jstac

@jstac If we want to run this on a machine with enough RAM I can put it up on the HPC resources at NYU. There we can request as much memory as we want.

Let me know if you would like to try this.

Regarding the points in the original post (disclaimer, I haven't read the current closely enough to comment on most of the points above):

I think accepting callables is a great idea.

Sep 14 '15 12:09 sglyon

Thanks for comments. We'll have a fancy machine at SMU within about 2 weeks that will have 256GB RAM so we were planning to wait for that. (Not that I want to stop you if you're curious.)

Regarding passing callables, there's a lot of stuff I don't know concerning how this would work. Do we jit the callables as they come in? Or ask the user to numba-fy them? Perhaps the former, but then we have to warn the user if they can't be jit compiled. So what do we tell them?

Can internal jitted functions work with jitted functions passed in as arguments? Will they know the return types?

(Clearly this will all be smoother in Julia.)

Sep 14 '15 12:09 jstac

I think using callables would be a good idea as well -- Unfortunately, the last time I tried, passing jitted functions as arguments does not work. I believe this is on numba's radar though (https://github.com/numba/numba/issues/821).

Sep 14 '15 13:09 cc7768

@jstac

At the end of issue #159 I think I have a partial solution showing how jitted functions can be passed as arguments to other jitted functions. I am not entirely happy with the proposed solution, but haven't had time to improve upon it yet...

Sep 14 '15 13:09 davidrpugh

Related to point 3: Should we allow DiscreteDP to accept beta=1 while raising NotImplementedError if evaluate_policy, policy_iteration, or modified_policy_iteration is called?

Nov 02 '15 13:11 oyamad

Re 3. Another issue: As an example of finite horizon DP with discounting, I tried the option pricing example from Miranda and Fackler, Section 7.6.4: http://nbviewer.ipython.org/github/oyamad/mdp/blob/master/ddp_ex_MF_7_6_4.ipynb

The figure for the "Put Option Optimal Exercise Boundary" (the right panel at the bottom) looks different from the corresponding figure in Miranda and Fackler (Figure 7.4(b), p.180). I wanted to check the output of the original code from CompEcon Toolbox, but I do not have Matlab in my computers.

Could anybody who has CompEcon Toolbox with Matlab (and time) please compare the outputs?

Nov 02 '15 13:11 oyamad

Re 1: Is it a feasible option to store the arrays in disk? Would Blaze help?

Mar 16 '16 13:03 oyamad

I missed the beginnning of this discussion and haven't looked closely at the code, but regarding point 1:

1/ I would be very curious to see what can be done with Blaze 2/ Do you really need to compute all state-action pairs at once ? Since the iteration ultimately involves a reduction w.r.t. the actions, I would hazard that it could work if you do it chunks by chunks. If so, it would be quite suboptimal to compute all the chunks and store them on the disk.

Mar 16 '16 14:03 albop

This is an interesting discussion. The key issue is how advantageous it is to compute transition probabilities and rewards up front. My prior is that it is very advantageous, given that we need to repeatedly iterate using these values or solve linear systems using them. See, for example, this routine.

In the end I think it's best if we offer two flavors. One is where all arrays are computed up front. This is fast but memory intensive. Another is where all quantities are computed on the fly, and routines like the one discussed above are replaced by iterative methods (the T_sigma operator is also a contraction mapping).

Mar 16 '16 14:03 jstac

I shall add that dealing with chunks in an efficient way, is precisely what blaze is supposed to be able to do. Another option to consider is xarray+dask. Actually getting some experience with xarray is probably a good investment, as it seems to be developped at a very fast pace.

Mar 16 '16 15:03 albop

@jstac I can confirm that the computational advantage to upfront computation of the rewards matrix is huge (orders of magnitude). I have own codes that use a slightly different approach to pure discretized value fn iteration than the one you lot are using (my implementation is in matlab with gpu). Based on this experience I would suggest that the idea of offering the current implementation as the default behaviour, but also allowing a 'lowmemory' option which does not pre-compute the reward matrix is a good combination.

Mar 20 '16 04:03 robertdkirkby

@robertdkirkby Great. We're going to run some horse races soon and post the results somewhere. Probably here.

Mar 20 '16 09:03 jstac

QuantEcon.py QuantEcon.py copied to clipboard

DiscreteDP: Issues

QuantEcon.py
QuantEcon.py copied to clipboard