tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Tantivy Internal Architecture Documentation

Open kj3moraes opened this issue 1 year ago • 10 comments

This is a request to make a full fledged documentation of the algorithms and the implementation of Tantivy. It would be a great resource to thoroughly document all of Tantivy's internals and the "flow" in a modern documentation style.

Proposal

We document the "internals" of Tantivy using MkDocs (specifically Material theme for Mkdocs since it is ubiquitous).

  • the classes (core, collector, indexer, etc.) - what they are used for and how they achieve this.
  • the essentials of distributed search
  • relation to Apache Lucene (similarities, differences)

Essentially, take the ARCHITECTURE.md file and flush it out further, and put it on an easily accessible site (with good UI).

This is not a step-by-step code walkthrough or a detailed documentation of every method.. It is aimed for someone

  • who knows Rust well
  • wants to learn about search engines, etc.
  • wants to learn about Tantivy's implementation
  • does not have the time to do a in-depth code readthrough.

Plan

I have studied Apache Lucene recently and have asked @fulmicoton if I can work on the internal documentation. My idea is that we can make a branch called internal-docs and setup the documentation there (Material Mkdocs has a great integration for github so I'm biased but we can use whatever everyone collectively decides on).

kj3moraes avatar Jan 12 '24 23:01 kj3moraes

I think the rust docs would be a good place for this, as they don't get outdated so easily and are the entry point for documentation.

Or do you think we can not do certain things in rust doc?

Btw I think core should mostly be dissolved, there's a PR for that: https://github.com/quickwit-oss/tantivy/pull/2259

PSeitz avatar Jan 13 '24 07:01 PSeitz

I was thinking more of a walkthrough of how the indexing and searching occurs internally. The rust docs are great when developing and when you need a reference but if I wanted to understand what exactly happened in the internals of Tantivy, I wouldn't be able to grok that from the docs easily.

Its just a suggestion to have something to learn about the process of distributed search and Tantivy's implementation of it.

kj3moraes avatar Jan 13 '24 23:01 kj3moraes

Yes, but we can have a walk through in the rust docs or are there features missing?

if I wanted to understand what exactly happened in the internals of Tantivy, I wouldn't be able to grok that from the docs easily.

I think that's an issue in the docs that others also have currently and should be fixed.

PSeitz avatar Jan 16 '24 13:01 PSeitz

Fair enough, we could do it in the Rust docs itself. Do they support

  • flowcharts
  • diagrams

If these are present (or can be added with some patches) then Rust docs sounds good. We would need some segregation to explain which is the internal documentation / walkthrough and which is the API reference.

Could you share some links for these kinds of docs that others have made ?

kj3moraes avatar Jan 23 '24 23:01 kj3moraes

@kj3moraes I'm ok with something outside of rustdoc as long as it is in markdown. The rust world tends to use mdbook for that.

fulmicoton avatar Jan 24 '24 00:01 fulmicoton

Fair enough, we could do it in the Rust docs itself. Do they support

* flowcharts

* diagrams

There's a mermaid integration, which looks promising https://docs.rs/simple-mermaid/latest/simple_mermaid/

If these are present (or can be added with some patches) then Rust docs sounds good. We would need some segregation to explain which is the internal documentation / walkthrough and which is the API reference.

I don't think this needs much separation. A walk-through is really helpful on the API level. Internals are also helpful to understand how to use an API.

Any documentation outside of CI (which rustdocs is part of) will become obsolete, which is the case for several tantivy docs already.

PSeitz avatar Jan 24 '24 05:01 PSeitz

Yeah fair enough, we can get started on it then. Should we make a separate branch for it ?

Also

Could you share some links for these kinds of docs that others have made ?

for reference

kj3moraes avatar Jan 25 '24 16:01 kj3moraes

Yeah fair enough, we can get started on it then. Should we make a separate branch for it ?

I don't think we need a branch for this.

Also

Could you share some links for these kinds of docs that others have made ?

for reference

Are you looking for something specific? The bigger crates have more extensive documentation, eg. https://docs.rs/tokio/latest/tokio/

PSeitz avatar Jan 26 '24 09:01 PSeitz

Hey @PSeitz , how do you propose we begin ?

kj3moraes avatar Feb 08 '24 15:02 kj3moraes

I like the straightforward style of Architecture.md and think it would make a good addition to the docs.

Probably makes sense to identify what's missing as high level concepts or structure and add them, before going into details.

PRs are fine against the main branch. You can also join our discord channel if there are some questions.

PSeitz avatar Feb 08 '24 16:02 PSeitz