onboarding icon indicating copy to clipboard operation
onboarding copied to clipboard

Try out d2lang to visualize the onboarding path

Open lintool opened this issue 6 months ago • 3 comments

Hi @lilyjge can you look at this? https://d2lang.com/

I've heard good things about it. Perhaps we can use it to diagram our onboarding path?

I have two goals here:

  1. I want to learn more about the features of d2lang - "diagrams as code" seems like a great idea... better than my current pptx workflow for diagrams (in Anserini/Pyserini/etc.)
  2. Our onboarding path is getting complicated... perhaps a diagram can help newcomers?

lintool avatar Jun 08 '25 14:06 lintool

Yes, it is great! Syntax is simple and easy to use. Attached is a replica of this diagram I was able to whip up really quickly. D2 supports icons as well, I just didn't add them. It has pretty cool features like markdown, code snippets, icons/images, SQL tables, sequence diagrams, and grids. It also looks nice with some default colour themes and customizability with CSS. A diagram could definitely help with the onboarding path. Do you mean the actual retrieval pipeline or the various onboarding docs? I think the bi-encoder architecture is a pretty good overview of the retrieval pipeline already, but a diagram would be useful to lay out how the different onboarding docs relate to each other in a higher level.

Image

lilyjge avatar Jun 08 '25 18:06 lilyjge

Oh nice!

Share code for above diagram?

I was thinking of diagramming our onboarding sequence - i.e., first anserini, then pyserini, then rankllm? And then we are starting to have offshoots... like your vector store stuff and ONNX. There are dependencies we should note...

lintool avatar Jun 08 '25 19:06 lintool

For the earlier diagram:

Documents -> doc_encoded: Doc Encoder
doc_encoded: "[...]\n[...]\n[...]"
q_encoded: "[...]"
Query Encoder -> q_encoded: Query Encoder
q_encoded -> Top-k Retrieval <- doc_encoded
Top-k Retrieval -> Ranked List

Something like this for onboarding?

Image Code:

Anserini : {
    BM 25 Baselines for MS MARCO Passage -> Dense Retrieval for MS MARCO Passage 
    direction: down
}
Pyserini: {
    BM 25 Baselines for MS MARCO Passage -> Conceptual Framework for Retrieval -> BGE-base Baseline for NFCorpus -> A Deeper Dive into Dense and Sparse Representations
    direction: down
}
RankLLM: Reranking with LLMs
Anserini -> Pyserini -> RankLLM

I think it would look nicer if the containers went left to right while keeping the subtasks up down, but the only layout engine that would support different directions is paid.

lilyjge avatar Jun 09 '25 02:06 lilyjge