nbdev icon indicating copy to clipboard operation
nbdev copied to clipboard

Option To Generate "Clean" notebooks for docs / tutorial for Goolge Colab/PaperSpace etc

Open hamelsmu opened this issue 3 years ago • 8 comments

  • I've met with Jonathan Whitaker who runs a very popular course on AI Art, with thousands of students. He is using nbdev & fastai for his next iteration of course which is to be released in a couple of months, his course is also sponsored by W&B. He has "run in colab" badges on his notebook, but he wants a version of his notebook that is stripped of all directives, and also the option to add colab specific cells or hide certain things for colab (like the pip install bits, etc).
  • W&B docs also have "run in colab" on their docs, but they end up repeating themselves because they have a slightly different version of the code that for Google Colab

I see that this is quite a common pattern for people wanting to make rich tutorials and such with nbdev. We have to think about the design a bit. Rough sketch

front_matter:

---
copy_nb_path: ....
--- 

directives:

  • #|copy_exclude: marks cells that should be excluded from the copied notebook.

Open question: should we have a way to exclude markdown? Perhaps this is possible with conditional rendering?

cc: @jph00 @seeM

hamelsmu avatar Aug 23 '22 18:08 hamelsmu

Another thread https://twitter.com/charles_irl/status/1563335298213220352?s=21&t=oGbErx4SNkDUiqiI04G7iA

hamelsmu avatar Aug 28 '22 13:08 hamelsmu

@hamelsmu At my org, we use a lot of Colab. nbdev2 relies on raw cells, if we wish to skip all tests and execution of cells while building docs, I mostly miss the #all_slow tag from nbdev1, it was then very handy to test nbs with that tag. As of now, I'm not aware if we can add raw cells in Google Colab.

p.s. please let me know, if I can help with creation of a Colab tutorial! I'd be more than happy to help

deven367 avatar Sep 08 '22 02:09 deven367

You don't actually have to use raw cells at all, if you'd rather not. I don't use them myself. Instead, I use an alternate format, which is a markdown cell in this format:

# title
> description
- yamlkey1: something
- yamlkey2: other

Please give that a go and let me know if you have any issues.

jph00 avatar Sep 08 '22 02:09 jph00

I checked the nbs in the nbdev repo and couldn't find a comprehensive example using this format. I found that most nbs have this format.

# title
> description 
- order: 1

I'm not sure what I would have to do if I wanted to skip tests during tests and building docs. Would it be something like this?

# title
> description 
- execute:
- eval: false 

Also, about the raw cells, I got that notion from the migrating from nbdev1 nb, this would probably be a good addition in that section.

deven367 avatar Sep 08 '22 02:09 deven367

Message ID: @.***>Good guess! It's actually: - skip_exec: true.

More details on this migration here: https://nbdev.fast.ai/top/migrating.html#update-directive-names

jph00 avatar Sep 08 '22 03:09 jph00

p.s. please let me know, if I can help with creation of a Colab tutorial! I'd be more than happy to help

We'd be delighted to see a colab tutorial :D

jph00 avatar Sep 08 '22 03:09 jph00

From the Forums

image

image

hamelsmu avatar Sep 14 '22 04:09 hamelsmu

I'll be working on this soon. I've been chatting with the Quarto folks about this, and they have created a paved path that will make this possible! More to come soon

hamelsmu avatar Sep 14 '22 18:09 hamelsmu

I'll be working on this soon. I've been chatting with the Quarto folks about this, and they have created a paved path that will make this possible! More to come soon

Excellent! Please let me know how I can be helpful - happy to test things, write docs etc.

johnowhitaker avatar Sep 15 '22 04:09 johnowhitaker

Notes For Creating A Google Colab Shortcode

  • To save "rendered" copies of notebooks, in addition to web pages for your docs, modify your _quarto.yml in this way. files by default will be written with the prefix out. for example docs.ipynb will be written to _site/docs.out.ipynb, you can change the prefix with the output-ext field in _quarto.yml:
format:
  html:
    theme: cosmo
+  ipynb: {}
+    output-ext: output.ipynb # this is optional (you will likely leave this out)
  • Another option is that you don't have to specify this in _quarto.yml at all, but can render notebooks from the CLI like this with the -M flag. this might be an option. Note that we can leave out about.ipynb in the example below if we like:
quarto render about.ipynb --to ipynb -M output-ext:output.ipynb`
  • You can set repo url metadata like this, which we can automatically populate with settings.ini (I think we should do this by default so people can enable the "edit this page" button, which is very useful!, we should also set the repo-branch and potentially the repo-subdir options

  • You can access metadata in a shortcode like this, you have to make sure your entry point includes all three args like this function(args, kwargs, meta)

  • You can use PANDOC_STATE.output_file to get the filename, note that quarto will process the document twice, one to html, and one to .ipynb, so you will have to write an if statement to check for that.

  • You can use the QUARTO_PROJECT_DIR env variable to get access to the root of the Quarto project directory, which will give you a full path (which I'm not sure is helpful yet). You can get this value in lua with os.getenv("QUARTO_PROJECT_DIR")) and yes it appears to be same syntax as python :p

I could not find a way to get the path of the current file in a shortcode, which is necessary for constructing the URL for the Colab badge, so I emailed JJ to ask


Design

  1. User specifies copy_nb_dir in settings.ini, if they set this variable, then the following front matter gets injected into all notebooks, where nbs_path/path_to_nb/file_name.ipynb is the current path relative to the root of the directory to the notebook
copy_nb_loc:  nbs_path/path_to_nb/file_name.ipynb
  1. If user has specified copy_nb_dir, they can now install an extension and put the shortcode {{ colab }} on any notebook. This will render the proper github badge by constructing the right url which will be something like this:

...github.com/owner/repo/branch/blob/{copy_nb_loc}

this will render the Colab badge.

  1. In the future, because we have that frontmatter added automatically, we can add other kinds of badges if we like.

hamelsmu avatar Sep 19 '22 19:09 hamelsmu

Lua shortcode prototype for colab badges

-- colab.lua
local str = pandoc.utils.stringify
local file = quarto.doc.project_output_file()
local prefix = 'https://colab.research.google.com/github/'
local img = '<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" style="max-width: 100%;">'

function colab(args, kwargs, meta)
    if quarto.doc.isFormat('ipynb') then
        local path = str(meta['colab.gh-repo'])..'/blob/'..str(meta['colab.branch'])..'/'..file
        return pandoc.RawBlock('html', '<a href="'..prefix..path..'"rel="nofollow">'..img..'</a>')
    end
end

Corresponding quarto.yml fields:

colab:
  gh-repo: hamelsmu/quarto_nbcopy
  branch: main
  exported_dir: colab/

Need to do the following

  • [ ] Need to fix the badge url to point to exported_dir
  • [ ] Copy the *.out.ipynb files into exported_dir
  • [ ] Setup machinery via settings.ini that sets the default _quarto.yml properly if colab: True

hamelsmu avatar Sep 20 '22 04:09 hamelsmu

ok new sketch

  ipynb:
    output-ext: colab.ipynb

colab:
  exported-dir: colab/

And the lua

local str = pandoc.utils.stringify
local file = quarto.doc.project_output_file()
local colab_prefix = 'https://colab.research.google.com/github/'
local colab_img = pandoc.Image('', 'https://colab.research.google.com/assets/colab-badge.svg', 'Open in Colab')


---make ending slash consistent
local function slash(s) return string.gsub(str(s), '/$', '')..'/' end

local function branch(meta)
    -- get the target repo's branch giving precedence to the colab: branch field, but defaulting to website: repo-branch
    local branch = meta['colab.branch']
    local web_branch = meta['website.repo-branch'] -- value set by automatically by nbdev
    if branch == nil  and web_branch ~= nil then branch = web_branch else branch = 'main' end
    return slash(branch)
end

local function repo(meta)
    -- get the name of the repo giving predence to the colab: github-repo field, but defaulting to parsing the website: repo-url field
    local repo = meta['colab.github-repo']
    local web_repo = meta['website.repo-url'] -- value set by automatically by nbdev
    if repo == nil and web_repo ~= nil then
        repo = str(web_repo):gsub('https://github.com/', '')
    end
    return slash(repo)
end

local function subdir(meta)
    -- get the directory of the exported notebook
    local nbdir = meta['colab.exported-dir']
    if nbdir == nil then return '' 
    else return slash(nbdir)
    end
end

function colab(args, kwargs, meta)
    -- construct the colab badge
    if quarto.doc.isFormat('html') then
        local path = repo(meta)..'blob/'..branch(meta)..subdir(meta)..file
        return pandoc.Div(pandoc.Link(colab_img, colab_prefix..path))
    end
end

Here is the repo with this code

hamelsmu avatar Sep 20 '22 17:09 hamelsmu