docstring_parser icon indicating copy to clipboard operation
docstring_parser copied to clipboard

Feature request: parse TensorFlow documentation

Open SamuelMarks opened this issue 3 years ago • 3 comments

TensorFlow is a popular open-source ML framework/ecosystem from Google.

Unfortunately your parser doesn't work well on its docstring. Here's a link to the docstring: https://github.com/tensorflow/tensorflow/blob/9df9d06/tensorflow/python/keras/optimizer_v2/adam.py#L35-L103 Snippet:

  r"""Optimizer that implements the Adam algorithm.
…
  [Kingma et al., 2014](http://arxiv.org/abs/1412.6980),
  the method is "*computationally
  efficient, has little memory requirement, invariant to diagonal rescaling of
  gradients, and is well suited for problems that are large in terms of
  data/parameters*".
  Args:
    learning_rate: A `Tensor`, floating point value, or a schedule that is a
      `tf.keras.optimizers.schedules.LearningRateSchedule`, or a callable
      that takes no arguments and returns the actual value to use, The
      learning rate. Defaults to 0.001.
…
  9.9
  Reference:
    - [Kingma et al., 2014](http://arxiv.org/abs/1412.6980)
    - [Reddi et al., 2018](
        https://openreview.net/pdf?id=ryQu7f-RZ) for `amsgrad`.
…
$ ipython3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf

In [2]: import inspect

In [3]: import docstring_parser

In [4]: docstring = docstring_parser.parse(inspect.getdoc(tf.keras.optimizers.Adam), style=docstring_parser.Style.google)

In [5]: tuple(map(lambda param: param.arg_name, docstring.params))
Out[5]: 
('learning_rate',
 'beta_1',
 'beta_2',
 'epsilon',
 'amsgrad',
 'name',
 '**kwargs',
 '- [Kingma et al., 2014](http',
 '- [Reddi et al., 2018](\n      https')

One hack if you don't want to fix your parser is to post-process dropping anything that fails an isidentifier check (Python 2 implementation).

SamuelMarks avatar Nov 07 '20 11:11 SamuelMarks

It will raise error as well on: 'tf.keras.layers.Layer'

The error happens at this line:

https://github.com/tensorflow/tensorflow/blob/85c8b2a817f95a3e979ecd1ed95bff1dc1335cff/tensorflow/python/keras/engine/base_layer.py#L166

$ ipython3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf

In [2]: import docstring_parser

In [3]: docstring = docstring_parser.parse(inspect.getdoc(tf.keras.layers.Layer), style=docstring_parser.Style.google)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-27091b0195cd> in <module>
----> 1 docstring = docstring_parser.parse(inspect.getdoc(tf.keras.layers.Layer), style=docstring_parser.Style.google)

~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/parser.py in parse(text, style)
     14 
     15     if style != Style.auto:
---> 16         return STYLES[style](text)
     17     rets = []
     18     for parse_ in STYLES.values():

~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/google.py in parse(text)
    272     :returns: parsed docstring
    273     """
--> 274     return GoogleParser().parse(text)

~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/google.py in parse(self, text)
    262             for j, (start, end) in enumerate(c_splits):
    263                 part = chunk[start:end].strip("\n")
--> 264                 ret.meta.append(self._build_meta(part, title))
    265 
    266         return ret

~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/google.py in _build_meta(self, text, title)
    104 
    105         # Split spec and description
--> 106         before, desc = text.split(":", 1)
    107         if desc:
    108             desc = desc[1:] if desc[0] == " " else desc

ValueError: not enough values to unpack (expected 2, got 1)

EmGarr avatar Jan 22 '21 19:01 EmGarr

@EmGarr I built my own parsers for all 3 docstring formats, classes, functions, and argparse-parsers.

$ ipython
Python 3.8.7 (default, Dec 30 2020, 22:35:32) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from doctrans import emit, parse

In [2]: from doctrans.source_transformer import to_code

In [3]: import tensorflow as tf

In [4]: parse.class_(tf.keras.layers.Layer)
Out[4]: 
{'name': 'Layer',
 'doc': "This is the class from which all layers inherit.\n\nA layer is a callable object that takes as input one or more tensors and\nthat outputs one or more tensors. It involves *computation*, defined\nin the `call()` method, and a *state* (weight variables), defined\neither in the constructor `__init__()` or in the `build()` method.\n\nUsers will just instantiate a layer and then treat it as a callable.\n\n\nAttributes:\n  name: The name of the layer (string).\n  dtype: The dtype of the layer's weights.\n  variable_dtype: Alias of `dtype`.\n  compute_dtype: The dtype of the layer's computations. Layers automatically\n    cast inputs to this dtype which causes the computations and output to also\n    be in this dtype. When mixed precision is used with a\n    `tf.keras.mixed_precision.Policy`, this will be different than\n    `variable_dtype`.\n  dtype_policy: The layer's dtype policy. See the\n    `tf.keras.mixed_precision.Policy` documentation for details.\n  trainable_weights: List of variables to be included in backprop.\n  non_trainable_weights: List of variables that should not be\n    included in backprop.\n  weights: The concatenation of the lists trainable_weights and\n    non_trainable_weights (in this order).\n  trainable: Whether the layer should be trained (boolean), i.e. whether\n    its potentially-trainable weights should be returned as part of\n    `layer.trainable_weights`.\n  input_spec: Optional (list of) `InputSpec` object(s) specifying the\n    constraints on inputs that can be accepted by the layer.\n\nWe recommend that descendants of `Layer` implement the following methods:\n\n* `__init__()`: Defines custom layer attributes, and creates layer state\n  variables that do not depend on input shapes, using `add_weight()`.\n* `build(self, input_shape)`: This method can be used to create weights that\n  depend on the shape(s) of the input(s), using `add_weight()`. `__call__()`\n  will automatically build the layer (if it has not been built yet) by\n  calling `build()`.\n* `call(self, *args, **kwargs)`: Called in `__call__` after making sure\n  `build()` has been called. `call()` performs the logic of applying the\n  layer to the input tensors (which should be passed in as argument).\n  Two reserved keyword arguments you can optionally use in `call()` are:\n    - `training` (boolean, whether the call is in\n      inference mode or training mode)\n    - `mask` (boolean tensor encoding masked timesteps in the input, used\n      in RNN layers)\n* `get_config(self)`: Returns a dictionary containing the configuration used\n  to initialize this layer. If the keys differ from the arguments\n  in `__init__`, then override `from_config(self)` as well.\n  This method is used when saving\n  the layer or a model that contains this layer.\n\nExamples:\n\nHere's a basic example: a layer with two variables, `w` and `b`,\nthat returns `y = w . x + b`.\nIt shows how to implement `build()` and `call()`.\nVariables set as attributes of a layer are tracked as weights\nof the layers (in `layer.weights`).\n\n```python\nclass SimpleDense(Layer):\n\n  def __init__(self, units=32):\n      super(SimpleDense, self).__init__()\n      self.units = units\n\n  def build(self, input_shape):  # Create the state of the layer (weights)\n    w_init = tf.random_normal_initializer()\n    self.w = tf.Variable(\n        initial_value=w_init(shape=(input_shape[-1], self.units),\n                             dtype='float32'),\n        trainable=True)\n    b_init = tf.zeros_initializer()\n    self.b = tf.Variable(\n        initial_value=b_init(shape=(self.units,), dtype='float32'),\n        trainable=True)\n\n  def call(self, inputs):  # Defines the computation from inputs to outputs\n      return tf.matmul(inputs, self.w) + self.b\n\n# Instantiates the layer.\nlinear_layer = SimpleDense(4)\n\n# This will also call `build(input_shape)` and create the weights.\ny = linear_layer(tf.ones((2, 2)))\nassert len(linear_layer.weights) == 2\n\n# These weights are trainable, so they're listed in `trainable_weights`:\nassert len(linear_layer.trainable_weights) == 2\n```\n\nNote that the method `add_weight()` offers a shortcut to create weights:\n\n```python\nclass SimpleDense(Layer):\n\n  def __init__(self, units=32):\n      super(SimpleDense, self).__init__()\n      self.units = units\n\n  def build(self, input_shape):\n      self.w = self.add_weight(shape=(input_shape[-1], self.units),\n                               initializer='random_normal',\n                               trainable=True)\n      self.b = self.add_weight(shape=(self.units,),\n                               initializer='random_normal',\n                               trainable=True)\n\n  def call(self, inputs):\n      return tf.matmul(inputs, self.w) + self.b\n```\n\nBesides trainable weights, updated via backpropagation during training,\nlayers can also have non-trainable weights. These weights are meant to\nbe updated manually during `call()`. Here's a example layer that computes\nthe running sum of its inputs:\n\n```python\nclass ComputeSum(Layer):\n\n  def __init__(self, input_dim):\n      super(ComputeSum, self).__init__()\n      # Create a non-trainable weight.\n      self.total = tf.Variable(initial_value=tf.zeros((input_dim,)),\n                               trainable=False)\n\n  def call(self, inputs):\n      self.total.assign_add(tf.reduce_sum(inputs, axis=0))\n      return self.total\n\nmy_sum = ComputeSum(2)\nx = tf.ones((2, 2))\n\ny = my_sum(x)\nprint(y.numpy())  # [2. 2.]\n\ny = my_sum(x)\nprint(y.numpy())  # [4. 4.]\n\nassert my_sum.weights == [my_sum.total]\nassert my_sum.non_trainable_weights == [my_sum.total]\nassert my_sum.trainable_weights == []\n```\n\nFor more information about creating layers, see the guide\n[Writing custom layers and models with Keras](\n  https://www.tensorflow.org/guide/keras/custom_layers_and_models)",
 'params': OrderedDict([('trainable',
               {'doc': "Boolean, whether the layer's variables should be trainable."}),
              ('name', {'doc': 'String name of the layer.'}),
              ('dtype',
               {'doc': "The dtype of the layer's computations and weights. Can also be a `tf.keras.mixed_precision.Policy`, which allows the computation and weight dtype to differ. Default of `None` means to use `tf.keras.mixed_precision.global_policy()`, which is a float32 policy unless set to different value."}),
              ('dynamic',
               {'doc': 'Set this to `True` if your layer should only be run eagerly, and should not be used to generate a static computation graph. This would be the case for a Tree-RNN or a recursive network, for example, or generally for any layer that manipulates tensors using Python control flow. If `False`, we assume that the layer can safely be used to generate a static computation graph.'}),
              ('_TF_MODULE_IGNORED_PROPERTIES',
               {'default': "```frozenset(itertools.chain(('_obj_reference_counts_dict',), module.Module.\n    _TF_MODULE_IGNORED_PROPERTIES))```"}),
              ('_must_restore_from_config',
               {'default': False, 'typ': 'bool'})]),
 'returns': None,
 '_internal': {'body': ['<omitted for brevity>']},
  'from_name': 'Layer',
  'from_type': 'cls'}}

https://github.com/SamuelMarks/doctrans

SamuelMarks avatar Jan 23 '21 00:01 SamuelMarks

This probably is fixed in 0.8. In any case seeing how the OP created their own project to parse these docstrings, I don't think this issue stays relevant.

rr- avatar May 26 '21 15:05 rr-