torchtune
torchtune copied to clipboard
Update models docs page
Context: Our models page was not informative and things like tokenizers were missing.
Testing: Eyes
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/954
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit 165df0061c41d5998c99747cdb2975c2a98603fd with merge base 1e56b807a4124bcd46c7e93f82897eac22e758ee ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 27.05%. Comparing base (
a978956
) to head (57c991c
). Report is 32 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #954 +/- ##
===========================================
- Coverage 66.39% 27.05% -39.35%
===========================================
Files 155 174 +19
Lines 6484 7363 +879
===========================================
- Hits 4305 1992 -2313
- Misses 2179 5371 +3192
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Overall this is a huge improvement. I'm not married to the blurbs - we can link to model cards/papers which sufficiently explain the models IMO.
I'd love to see the following things (some of which are here already):
- A top-level note to let users know they can swap all the models they see here across all of our recipes. How much detail we provide on doing so vs. linking to other docs I'm open to discussing.
- Memory usage of different models - I think this is just a nice QOL add, even just to say the memory usage default for bf16.
- The explainer on generic builders has been sorely missing. I wouldn't mind a little example showing when to use one (e.g. for tinyllama or something)
- Would it be inappropriate to immortalize a contributor's contribution if they contributed a specific model, by linking to their contribution in the model section?
- I think the tokenizer sections could use a little love, particularly since we've fleshed these out a bit. For example, worth specifying llama tokenizer is a sentencepiece tokenizer, and not the fast BPE tokenizer?
More thoughts maybe to come. Thanks for reviving this.
Also, maybe this is obvious to everyone, but it might be a good opportunity to add a little noobsplainer™ about what's up with all our lora
model and component builders. We can defer to the tutorial for in-depth details here. For example, lora_llama2_7b
isn't actually another Llama2 model you can download or find the weights for, it's just the LoRA version of the same model, and here's how to specify it, et cetera. Maybe my section in the memory glossary is a good link here?
Overall I think this is a huge improvement, we just need to ensure the updates are applied consistently across all the different models we support. A few responses to various comments I've seen so far:
(1) I see a few comments about noise/information ratio, especially around which APIs we do and don't document. Imo we shouldn't compromise on documenting our public APIs, even if it means there is extra stuff on our docs pages. Instead, we should structure our docs such that the important stuff is easy to find, and the less pressing stuff is still available. I think separating models out into separate pages will help, as will structuring each individual page from most essential info at the top to least essential info at the bottom.
(2) Re stuff like commands to fine-tune a given model or a given model's memory, I don't think we should put it here. Instead, we should provide this information elsewhere and link to it. These are API docs and so first and foremost their job is to provide everything a user needs to know about interacting with said APIs. So the existing example usage is the right idea imo -- it shows how to use the provided APIs in Python in a minimal and self-contained way. (Though I would think about how we can make the example usage even a bit clearer, the checkpointer portion is unfortunately a bit of a black box in its current format.)
Would it be inappropriate to immortalize a contributor's contribution if they contributed a specific model, by linking to their contribution in the model section?
This I wouldn't do personally. Unlike tutorials where we can reference a contributor in a single line at the top, our API docs shouldn't have any extra fluff