crawlee-python
crawlee-python copied to clipboard
API reference rendering is broken
Goal state
- Correctly render docstrings in Docusaurus.
- We use Google style.
- Types should be rendered from type annotations (not docstrings).
Arguments of (public) functions & methods
- Currently they are not rendered.
async def run(
self,
requests: Sequence[str | Request] | None = None,
*,
purge_request_queue: bool = True,
) -> FinalStatistics:
"""Run the crawler until all requests are processed.
Args:
requests: The requests to be enqueued before the crawler starts.
purge_request_queue: If this is `True` and the crawler is not being run for the first time, the default
request queue will be purged.
"""
Objects attributes
- It should render objects' public attributes and properties in the same way (properties are methods decorated with
@property
decorator), probably under "Attributes".
class A:
"""Blahblah
Attributes:
b: document b here
"""
def __init__(self, b: int) -> None:
self.b = b
@property
def c(self) -> int:
"Document c here."
return self.b + 1
Class attributes
- Render public class attributes, probably under "Class attributes".
class A:
"""Blah blah.
Class Attributes:
c: Cocument c here?
"""
c = 1
"""Or cocument c here?"""
Properties of Pydantic models, data classes, and maybe typed dicts
- How we should document them to be rendered (Args vs Attributes)?
- For example:
from pyndatic import BaseModel
class Configuration(BaseModel):
"""Configuration of the Crawler.
Attributes:
internal_timeout: timeout for internal operations such as marking a request as processed
verbose_log: allows verbose logging
"""
internal_timeout: Annotated[timedelta | None, Field(alias='crawlee_internal_timeout')] = None
verbose_log: Annotated[bool, Field(alias='crawlee_verbose_log')] = False
Inherit docstrings from base classes
For example:
class MemoryStorageClient(BaseStorageClient):
# Class docstring is inherited from BaseStorageClient.
@override
def dataset(self, id: str) -> DatasetClient:
# Method docstring is inherited from `BaseStorageClient.dataset`.
return DatasetClient(
memory_storage_client=self,
id=id,
)
Inline kwargs unpack
- It should list all arguments, including the kwargs unpack (defined as typed dict).
class HttpCrawler(BasicCrawler):
def __init__(
self,
blah: Iterable[int] = (),
**kwargs: Unpack[BasicCrawlerOptions],
) -> None:
# List of all args is `blah` and all from kwargs.