QuantEcon.py
QuantEcon.py copied to clipboard
DiscreteDP: Issues
Here's a list of issues for discussion regarding the DiscreteDP
class.
-
This is the major issue: There is an intrinsic limitation of the current approach of
DiscreteDP
, that one has to prepare arraysR
andQ
(ands_indices
anda_indices
with the "state-action pairs formulation") in advance. If the problem is large, the memory can become full upon creating those arrays (before passing them toDiscreteDP
). For example, I tried to setup the problem given in Comparison-Programming-Languages-Economics, where there are 17,820 x 5 = 89,100 states and 17,820 actions, so there are 89,100 x 17,820 ~ 1.6 x 10^9 state-action pairs. But I haven't been able to creates_indices
(size 1.6 x 10^9),a_indices
(size 1.6 x 10^9),R
(size1.6 x 10^9), andQ
(3 nonzero entries for each of 1.6 x 10^9 pairs) without having the memory full.One (and only?) resolution would be to have
DiscreteDP
accept functions (callables) forR
andQ
. -
With the "state-action pairs formulation",
DiscreteDP
should work without thes_indices
anda_indices
arrays when all the actions are available at all states (otherwise they should be necessary). -
Would we want a finite horizon algorithm?
- The current implementation does not accept
beta=1
because of the convergence/zero-division issues.
EDIT: Done by #244.
- The current implementation does not accept
-
Are there other important informations to store in
DPSolveResult
? -
The method operator_iteration may be replaced with
compute_fixed_point
, whereas the latter does not exactly match here. See also #40 and #41. -
Inputs checking is not exhaustive.
- It is not checked that the entries of
Q
are nonnegative and sum to one for each state-action pair. - For the "state-action pairs formulation", it is not checked that there are no duplicate action indices for the same state. (For this, it may be better to write custom code to sort
s_indices
anda_indices
(when the inputs are not sorted), which currently is delegated tospcipy.sparse.csr_matrix
.)
If we check everything, maybe better to supply an option to skip the checks for the case when the user is sure that the inputs are correct and creates an instance many times in a loop? Or, maybe should collect the checking procedures in a method and assume that the user calls it manually?
- It is not checked that the entries of
-
__repr__
and__str__
are missing. What do we want to print?
Regarding the possibility of passing callables to the DiscreteDP
class, it seems that jitted functions can call jitted functions, which should be all that's necessary.
Here's an example where this happens from the Numba documentation:
(create_fractal
calls madel
)
It's an interesting issue. When the problem has small or medium size it should be faster to compute all the arrays and store them in memory. So for the problems I've worked on the code seems very quick. But for large problems we have a memory issue.
@mmcky When that server at SMU is up and running how much RAM will we have? It might be fun to run this same problem @oyamad references without changing the code to see how fast it runs.
@jstac If we want to run this on a machine with enough RAM I can put it up on the HPC resources at NYU. There we can request as much memory as we want.
Let me know if you would like to try this.
Regarding the points in the original post (disclaimer, I haven't read the current closely enough to comment on most of the points above):
- I think accepting callables is a great idea.
Thanks for comments. We'll have a fancy machine at SMU within about 2 weeks that will have 256GB RAM so we were planning to wait for that. (Not that I want to stop you if you're curious.)
Regarding passing callables, there's a lot of stuff I don't know concerning how this would work. Do we jit the callables as they come in? Or ask the user to numba-fy them? Perhaps the former, but then we have to warn the user if they can't be jit compiled. So what do we tell them?
Can internal jitted functions work with jitted functions passed in as arguments? Will they know the return types?
(Clearly this will all be smoother in Julia.)
I think using callables would be a good idea as well -- Unfortunately, the last time I tried, passing jitted functions as arguments does not work. I believe this is on numba's radar though (https://github.com/numba/numba/issues/821).
@jstac
At the end of issue #159 I think I have a partial solution showing how jitted functions can be passed as arguments to other jitted functions. I am not entirely happy with the proposed solution, but haven't had time to improve upon it yet...
Related to point 3:
Should we allow DiscreteDP
to accept beta=1
while raising NotImplementedError
if evaluate_policy
, policy_iteration
, or modified_policy_iteration
is called?
Re 3. Another issue: As an example of finite horizon DP with discounting, I tried the option pricing example from Miranda and Fackler, Section 7.6.4: http://nbviewer.ipython.org/github/oyamad/mdp/blob/master/ddp_ex_MF_7_6_4.ipynb
The figure for the "Put Option Optimal Exercise Boundary" (the right panel at the bottom) looks different from the corresponding figure in Miranda and Fackler (Figure 7.4(b), p.180). I wanted to check the output of the original code from CompEcon Toolbox, but I do not have Matlab in my computers.
Could anybody who has CompEcon Toolbox with Matlab (and time) please compare the outputs?
Re 1: Is it a feasible option to store the arrays in disk? Would Blaze help?
I missed the beginnning of this discussion and haven't looked closely at the code, but regarding point 1:
1/ I would be very curious to see what can be done with Blaze 2/ Do you really need to compute all state-action pairs at once ? Since the iteration ultimately involves a reduction w.r.t. the actions, I would hazard that it could work if you do it chunks by chunks. If so, it would be quite suboptimal to compute all the chunks and store them on the disk.
This is an interesting discussion. The key issue is how advantageous it is to compute transition probabilities and rewards up front. My prior is that it is very advantageous, given that we need to repeatedly iterate using these values or solve linear systems using them. See, for example, this routine.
In the end I think it's best if we offer two flavors. One is where all arrays are computed up front. This is fast but memory intensive. Another is where all quantities are computed on the fly, and routines like the one discussed above are replaced by iterative methods (the T_sigma operator is also a contraction mapping).
I shall add that dealing with chunks in an efficient way, is precisely what blaze
is supposed to be able to do. Another option to consider is xarray+dask
. Actually getting some experience with xarray
is probably a good investment, as it seems to be developped at a very fast pace.
@jstac I can confirm that the computational advantage to upfront computation of the rewards matrix is huge (orders of magnitude). I have own codes that use a slightly different approach to pure discretized value fn iteration than the one you lot are using (my implementation is in matlab with gpu). Based on this experience I would suggest that the idea of offering the current implementation as the default behaviour, but also allowing a 'lowmemory' option which does not pre-compute the reward matrix is a good combination.
@robertdkirkby Great. We're going to run some horse races soon and post the results somewhere. Probably here.