baedan

Results 25 comments of baedan

another datapoint here: (nightly build `1.13.0.dev20220704`, i7-7920HQ / Radeon Pro 560) running the two matrix multiplication tasks [above](https://github.com/pytorch/pytorch/issues/77799#issuecomment-1168102637:) ``` cpu 4.141497317003086 mps 102.30748211298487 ``` ``` cpu 0.7337763879913837 mps 55.41895890497835 ```

oh, that's super weird. i would've guessed having it in a `let` block would only make it _more_ likely to work.

note that expected SARSA has the same issue. it's similar to q-learning in that action need not be selected before an update using the previous transition. there's no pseudocode in...

finding it pretty difficult to implement TRPO. [Zygote doesn't particularly like higher-order gradients](https://github.com/FluxML/Zygote.jl/issues/1271), and [Flux doesn't particularly like dealing with flat parameters](https://github.com/FluxML/Flux.jl/issues/2026), both of which are necessary for TRPO. i...

> Glad to see you again here! 🤗 i've been giving myself a crash course on deep learning :D mostly done with that now. good to hear about `Approximator`. `dist`...

some more stray thoughts: 1. i don't think this line works when the action space is a cartesian product of discrete spaces, right? in that case, if i remember correctly,...

isn't `TD(λ)` separately defined in `TDλReturnLearner`? the `n` here can just be the `n` as in _n-step TD methods_, no? it's simple enough to change, but i'm not sure how...

i know about `OffPolicy`, but i don't how it helps here. if i were to estimate the action or state values of a particular policy, both the estimates and the...

hm, i might be missing something here, so just in case that's the case, i'll ask a clarifying question that should make it clear one way or the other: what...

but if i want to evaluate the action values, i have to use a `QBasedPolicy` for it to work, regardless of what policy i actually want to evaluate, no? in...