community icon indicating copy to clipboard operation
community copied to clipboard

Developer documentation

Open fjetter opened this issue 3 years ago • 5 comments

In an off-line discussion about technical debt and code complexity the valid concern was raised that many of our internal systems are not properly documented.

One example that came up is the current/new state machine (https://github.com/dask/distributed/issues/4413 https://github.com/dask/distributed/pull/5046) which is documented to some extend (https://distributed.dask.org/en/stable/scheduling-state.html and https://distributed.dask.org/en/stable/worker.html#internal-scheduling) but likely not sufficiently for another developer to make educated judgment calls about code changes.

I would like to collect topics, mostly for dask/dask and dask/distributed where more extensive developer documentation would help either onboarding new developers or help existing developers to familiarize themselves with other areas of the code.

cc @jcrist @jrbourbeau @gjoseph92 @ncclementi

  • [x] https://github.com/dask/distributed/issues/5413
  • [ ] https://github.com/dask/distributed/issues/5414
  • [ ] https://github.com/dask/distributed/issues/5415
  • [ ] https://github.com/dask/dask/issues/7755
  • [ ] https://github.com/dask/distributed/issues/5416
  • [ ] https://github.com/dask/distributed/issues/5417

fjetter avatar Oct 12 '21 15:10 fjetter

Thanks for opening this @fjetter!

A few topics that come to mind:

  • Task states and and valid state transitions and how those are handled in the scheduler
  • The worker state machine and how it relates to the above
  • The path from dask collection -> HLG -> low level graph -> scheduler -> tasks (we have some docs on this already, but again probably not enough or easily discovered)
  • Networking in distributed. What talks to what, and in what direction? Are multiple interfaces supported? What are the different comm types? Any security implications?
  • Disk spilling/memory management. When does data move on the worker, and how is this configured?
  • Cythonization in the scheduler. How is this project going, how is it configured and applied, ... (perhaps this is in an active issue?)

jcrist avatar Oct 12 '21 17:10 jcrist

I would add implementing Cluster classes to that list. Maybe custom adaptive classes too.

jacobtomlinson avatar Oct 12 '21 17:10 jacobtomlinson

High level graphs are another area that have been mentioned as needing better developer docs. There is a tracking issue here: https://github.com/dask/dask/issues/7755

GenevieveBuckley avatar Oct 12 '21 23:10 GenevieveBuckley

Disk spilling/memory management. When does data move on the worker, and how is this configured?

https://distributed.dask.org/en/stable/worker.html#memory-management

Is this sufficient? Should I create a ticket to restructure/move this?

fjetter avatar Oct 13 '21 09:10 fjetter

I created dedicated issues for the topics you mentioned. We can move the discussion about the individual items to the respective tickets.

Apart from further collecting topics, I would be curious about how we want to structure these new or already existing sections. I already realized, while researching the topic on our current docs, that some of the information asked here is already partially documented under "Developer Documentation" while other are in "Build understanding". This might be a judgement call for individual topics but if there are general best practices to follow, this can be discussed here as well.

fjetter avatar Oct 13 '21 09:10 fjetter