questions while looking at implementation of VPG
hello! while going through vpg.jl i had some odds-and-ends questions. still pretty new to julia and especially Flux.jl, so please bear with me :D
- i don't understand the point of having separate
NeuralNetworkApproximatorandLinearApproximatortypes: it seems to me that they are no more than wrappers for a approximator function, which can be of whatever type, as well as an optimizer. conceptually, they do the same things: forward pass, optimize. shouldn't this be abstracted away as aFunctionApproximator? - related to 1, loss functions are hardcoded here, as opposed to being a parameter of an
Approximator. - should we use something like Flux.Losses.logitcrossentropy rather than hand-coding it?
distis a weird field. the docstring "distribution function of the action" isn't much help, either. the example of it running an experiment useddist = Categoricalgave me some clue; but in any case, this seems to me like something that should be abstracted.action_spacelooks weird to me as something you pass to the policy. when sampling an action given an environmentpolicy(env), we can retrieveaction_spacefrom the environment. when optimizing a policy, we only need to know if the approximator is continuous or discrete (and it feels like it should be abstracted away anyway?) additionally, this is problematic when the environment is of FULL_ACTION_SET.- related to 5, is the following good julia style? does the advice here apply? https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/318f0f1d2868adcb96050fbfe5894509287a3d37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L55-L63
- the two fields
lossandbaseline_lossaren't used at all, so i assume they are meant to be exposed for logging purposes. should they be put inside an approximator instead?
Hi @baedan ,
Glad to see you again here! 🤗
- i don't understand the point of having separate
NeuralNetworkApproximatorandLinearApproximatortypes: it seems to me that they are no more than wrappers for a approximator function, which can be of whatever type, as well as an optimizer. conceptually, they do the same things: forward pass, optimize. shouldn't this be abstracted away as aFunctionApproximator?
That was for historical reasons. In the master branch, we have only one Approximator defined. See https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningCore/src/policies/learners.jl for more details.
- related to 1, loss functions are hardcoded here, as opposed to being a parameter of an
Approximator.
In many cases, the choice of loss functions are policy specific. So I didn't add it into the Approximator.
- should we use something like Flux.Losses.logitcrossentropy rather than hand-coding it?
Yes.
distis a weird field. the docstring "distribution function of the action" isn't much help, either. the example of it running an experiment useddist = Categoricalgave me some clue; but in any case, this seems to me like something that should be abstracted.
Here dist is to generate the action distribution based on the output from the model (in the approximator). I'm afraid it is not that easy to turn it into an abstract part. Because not all the algorithms support both continuous and discrete action spaces.
action_spacelooks weird to me as something you pass to the policy. when sampling an action given an environmentpolicy(env), we can retrieveaction_spacefrom the environment. when optimizing a policy, we only need to know if the approximator is continuous or discrete (and it feels like it should be abstracted away anyway?) additionally, this is problematic when the environment is of FULL_ACTION_SET.
Agree. We may remove it later.
- related to 5, is the following good julia style? does the advice here apply? https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/318f0f1d2868adcb96050fbfe5894509287a3d37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L55-L63
Personally, I think the advice doesn't apply here. Because we have some extra lines above:
https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/318f0f1d2868adcb96050fbfe5894509287a3d37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L51-L53
This means in each dispatch, we need to copy & paste them (or wrap it into a helper function). So we simply use if else here.
Also note the last sentence in the advice above:
It should however be noted that the compiler is quite efficient at optimizing away the dead branches in code written as the mynorm example.
I believe it can be optimized here. Because the type of VPGPolicy.action_space is included as a parameter of VPGPolicy. Anyway, you should stick with the advice in most cases.
- the two fields
lossandbaseline_lossaren't used at all, so i assume they are meant to be exposed for logging purposes. should they be put inside an approximator instead?
They are used for logging during training. I understand they are kind of strange to be included in the definition of a policy. This issue will be addressed in https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/725 soon.
Glad to see you again here! 🤗
i've been giving myself a crash course on deep learning :D mostly done with that now.
good to hear about Approximator.
dist and action space type do seem a bit tricky. i do think the snippet in 6 could probably be put into a function. the function would take parameters as input, a dist function, dispatches on action space types, and returns a Distribution object. (note that currently, the semantics of dist is different, depending on action space type. for discrete action space, it means the function that's applied after NN output gets softmaxed. for continuous action space, there's no softmax step.) by default, discrete action space would apply softmax + Categorical; for continuous action space, it would be something like Gaussian($\theta$...).
this would tighten up loss calculation as well: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/318f0f1d2868adcb96050fbfe5894509287a3d37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L131-L144
some more stray thoughts:
-
i don't think this line works when the action space is a cartesian product of discrete spaces, right? in that case, if i remember correctly, array index would just return the first dimension. https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/318f0f1d2868adcb96050fbfe5894509287a3d37/src/ReinforcementLearningZoo/src/algorithms/policy_gradient/vpg.jl#L57
-
it's funny, i think you can pass the appropriate
distto VPG so that it becomes DQN, depending on the loss function