crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

API reference rendering is broken

Open vdusek opened this issue 7 months ago • 0 comments

Goal state

  • Correctly render docstrings in Docusaurus.
  • We use Google style.
  • Types should be rendered from type annotations (not docstrings).

Arguments of (public) functions & methods

  • Currently they are not rendered.
    async def run(
        self,
        requests: Sequence[str | Request] | None = None,
        *,
        purge_request_queue: bool = True,
    ) -> FinalStatistics:
        """Run the crawler until all requests are processed.

        Args:
            requests: The requests to be enqueued before the crawler starts.
            purge_request_queue: If this is `True` and the crawler is not being run for the first time, the default
                request queue will be purged.
        """

Objects attributes

  • It should render objects' public attributes and properties in the same way (properties are methods decorated with @property decorator), probably under "Attributes".
class A:
    """Blahblah

    Attributes:
        b: document b here
    """

    def __init__(self, b: int) -> None:
        self.b = b

    @property
    def c(self) -> int:
        "Document c here."
        return self.b + 1

Class attributes

  • Render public class attributes, probably under "Class attributes".
class A:
    """Blah blah.

    Class Attributes:
        c: Cocument c here?    
    """

    c = 1
    """Or cocument c here?"""

Properties of Pydantic models, data classes, and maybe typed dicts

  • How we should document them to be rendered (Args vs Attributes)?
  • For example:
from pyndatic import BaseModel


class Configuration(BaseModel):
    """Configuration of the Crawler.

    Attributes:
        internal_timeout: timeout for internal operations such as marking a request as processed
        verbose_log: allows verbose logging
    """

    internal_timeout: Annotated[timedelta | None, Field(alias='crawlee_internal_timeout')] = None
    verbose_log: Annotated[bool, Field(alias='crawlee_verbose_log')] = False

Inherit docstrings from base classes

For example:

class MemoryStorageClient(BaseStorageClient):
    # Class docstring is inherited from BaseStorageClient.

    @override
    def dataset(self, id: str) -> DatasetClient:
        # Method docstring is inherited from `BaseStorageClient.dataset`.
        return DatasetClient(
            memory_storage_client=self,
            id=id,
        )

Inline kwargs unpack

  • It should list all arguments, including the kwargs unpack (defined as typed dict).
class HttpCrawler(BasicCrawler):
    def __init__(
        self,
        blah: Iterable[int] = (),
        **kwargs: Unpack[BasicCrawlerOptions],
    ) -> None:
        # List of all args is `blah` and all from kwargs.

Rendered return type is clickable

vdusek avatar Jul 18 '24 10:07 vdusek