oneflow
oneflow copied to clipboard
[Feature Request]: Oneflow.distributions
Background and motivation
Hi, thanks for your work. But when I'm tring to migrate my PyTorch code to Oneflow code, I find that there are only few APIs in oneflow.distributions. So this part is very hard for me to deal with. Could you please add more features in this part or give me some advices on how to migrate this part? Thanks for your attention :)
API Proposal
For example, distributions in PyTorch style Normal ...
class torch.distributions.normal.Normal(loc, scale, validate_args=None)
pass
API Usage
I think this API will be very useful in the area of Reinforcement Learning.
Alternatives
No response
Risks
No response
Sure, we will look into it. It would also help if you could post a more complement example so that we could introduce it as a regression test.
Thanks for your attention. I hope this code can help you for testing :)
class StochasticDuelingHead(nn.Module):
"""
Overview:
The ``Stochastic Dueling Network`` proposed in paper ACER (arxiv 1611.01224). \
Dueling network architecture in continuous action space. \
Input is a (:obj:`torch.Tensor`) of shape ``(B, N)`` and returns a (:obj:`Dict`) containing \
outputs ``q_value`` and ``v_value``.
Interfaces:
``__init__``, ``forward``.
"""
def __init__(
self,
hidden_size: int,
action_shape: int,
layer_num: int = 1,
a_layer_num: Optional[int] = None,
v_layer_num: Optional[int] = None,
activation: Optional[nn.Module] = nn.ReLU(),
norm_type: Optional[str] = None,
noise: Optional[bool] = False,
last_tanh: Optional[bool] = True,
) -> None:
"""
Overview:
Init the ``Stochastic DuelingHead`` layers according to the provided arguments.
Arguments:
- hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``StochasticDuelingHead``.
- action_shape (:obj:`int`): The number of continuous action shape, usually integer value.
- layer_num (:obj:`int`): The number of default layers used in the network to compute action and value \
output.
- a_layer_num (:obj:`int`): The number of layers used in the network to compute action output. Default is \
``layer_num``.
- v_layer_num (:obj:`int`): The number of layers used in the network to compute value output. Default is \
``layer_num``.
- activation (:obj:`nn.Module`): The type of activation function to use in MLP. \
If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``.
- norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \
for more details. Default ``None``.
- noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \
Default ``False``.
- last_tanh (:obj:`bool`): If ``True`` Apply ``tanh`` to actions. Default ``True``.
"""
super(StochasticDuelingHead, self).__init__()
if a_layer_num is None:
a_layer_num = layer_num
if v_layer_num is None:
v_layer_num = layer_num
layer = NoiseLinearLayer if noise else nn.Linear
block = noise_block if noise else fc_block
self.A = nn.Sequential(
MLP(
hidden_size + action_shape,
hidden_size,
hidden_size,
a_layer_num,
layer_fn=layer,
activation=activation,
norm_type=norm_type
), block(hidden_size, 1)
)
self.V = nn.Sequential(
MLP(
hidden_size,
hidden_size,
hidden_size,
v_layer_num,
layer_fn=layer,
activation=activation,
norm_type=norm_type
), block(hidden_size, 1)
)
if last_tanh:
self.tanh = nn.Tanh()
else:
self.tanh = None
def forward(
self,
s: torch.Tensor,
a: torch.Tensor,
mu: torch.Tensor,
sigma: torch.Tensor,
sample_size: int = 10,
) -> Dict[str, torch.Tensor]:
"""
Overview:
Use encoded embedding tensor to run MLP with ``StochasticDuelingHead`` and return the prediction dictionary.
Arguments:
- s (:obj:`torch.Tensor`): Tensor containing input embedding.
- a (:obj:`torch.Tensor`): The original continuous behaviour action.
- mu (:obj:`torch.Tensor`): The ``mu`` gaussian reparameterization output of actor head at current \
timestep.
- sigma (:obj:`torch.Tensor`): The ``sigma`` gaussian reparameterization output of actor head at \
current timestep.
- sample_size (:obj:`int`): The number of samples for continuous action when computing the Q value.
Returns:
- outputs (:obj:`Dict`): Dict containing keywords \
``q_value`` (:obj:`torch.Tensor`) and ``v_value`` (:obj:`torch.Tensor`).
Shapes:
- s: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``.
- a: :math:`(B, A)`, where ``A = action_size``.
- mu: :math:`(B, A)`.
- sigma: :math:`(B, A)`.
- q_value: :math:`(B, 1)`.
- v_value: :math:`(B, 1)`.
"""
batch_size = s.shape[0] # batch_size or batch_size * T
hidden_size = s.shape[1]
action_size = a.shape[1]
state_cat_action = torch.cat((s, a), dim=1) # size (B, action_size + state_size)
a_value = self.A(state_cat_action) # size (B, 1)
v_value = self.V(s) # size (B, 1)
# size (B, sample_size, hidden_size)
expand_s = (torch.unsqueeze(s, 1)).expand((batch_size, sample_size, hidden_size))
# in case for gradient back propagation
dist = Independent(Normal(mu, sigma), 1)
action_sample = dist.rsample(sample_shape=(sample_size, ))
if self.tanh:
action_sample = self.tanh(action_sample)
# (sample_size, B, action_size)->(B, sample_size, action_size)
action_sample = action_sample.permute(1, 0, 2)
# size (B, sample_size, action_size + hidden_size)
state_cat_action_sample = torch.cat((expand_s, action_sample), dim=-1)
a_val_sample = self.A(state_cat_action_sample) # size (B, sample_size, 1)
q_value = v_value + a_value - a_val_sample.mean(dim=1) # size (B, 1)
return {'q_value': q_value, 'v_value': v_value}
Any new progress about this issue?