dash/zeal添加新文档

Open spacewander opened this issue 9 years ago • 0 comments

dash介绍

dash是一个收费的文档浏览应用。什么是文档浏览应用呢？就是将文档的HTML源格式打包成docset，该docset对文档中的条目加了索引，然后用户就可以在应用内查询相关的条目内容。当然dash的功能不仅止于浏览文档，如果感兴趣的话可以去官网接受下安利。

如果你之前没有用过类似的文档应用，建议现在就用起来。它们可以显著地提高文档查找的效率，而查文档又是日常编程中常做的事情之一。

如果你不是OS X用户，可以试下zeal，一个开源的dash实现（支持Windows和Linux）。如果觉得dash的价格太贵，也可以试下devdocs，我会在下一篇文章中讲讲devdocs相关的内容。

如何生成dash docset

docset包含三部分：源文档的HTML文件、跟文档展示相关的静态资源、用于索引的sqlite表。生成docset，简单来说就是处理源文档的HTML文件，然后提取出条目，并写入sqlite。具体的生成方式见https://kapeli.com/docsets 。如果需要生成的文档是由godoc之类的生成器生成的，由于它们遵循同样的格式，可以直接用别人写好的工具生成。否则的话，需要自己解析源文档的HTML文件。下面我将以openresty项目为例，阐述下dash docset的生成步骤。

创建一个文档需要以下几步（假设需要生成的文档名为docset_name）：

创建<docset_name>.docset/Contents/Resources/Documents/文件夹。
复制文档的HTML源文件到上面的文件夹中。注意其中包括HTML中引用的css/image等外部资源，如果需要的话，修改HTML中对这些资源的引用路径。

resources = set()
    rewritten_head = '<title>%s</title>\n' % metadata.name
    for css in soup.findAll('link', rel='stylesheet'):
        link = css['href'].rpartition('/')[-1]
        resources.add(Resource(filename=link, url=css['href']))
        new_css = soup.new_tag('link')
        new_css['rel'] = 'stylesheet'
        new_css['href'] = link
        rewritten_head += str(new_css)

在<docset_name>.docset/Contents文件夹下创建文件Info.plist。官方教程建议直接从模板上改。
在<docset_name>.docset/Contents/Resources/docSet.dsidx文件中创建SQLites的表和索引。创建表的SQL语法为CREATE TABLE searchIndex(id INTEGER PRIMARY KEY, name TEXT, type TEXT, path TEXT);，创建索引的SQL语法为CREATE UNIQUE INDEX anchor ON searchIndex (name, type, path);。这里name将会是条目的名字。而type是条目的类型。path是HTML源文件中所对应的路径。举个例子，C.docset中的fopen文档，它在这个表里面的表示方式是这样的：

name	type	path
fopen	Function	./c/io/fopen.html

dash所支持的类型type取值见 https://kapeli.com/docsets#supportedentrytypes

def write_sql_schema(fn='OpenResty.docset/Contents/Resources/docSet.dsidx'):
    db = sqlite3.connect(fn)
    cur = db.cursor()
    try:
        cur.execute('DROP TABLE searchIndex;')
    except Exception:
        pass
    cur.execute('CREATE TABLE searchIndex(id INTEGER PRIMARY KEY, name TEXT, type TEXT, path TEXT);')
    cur.execute('CREATE UNIQUE INDEX anchor ON searchIndex (name, type, path);')
    db.commit()
    db.close()

最困难的一步来了，你需要写一个脚本，从HTML源文件中提取内容，并插入到上一步所创建的表。

entries = []
    readme = soup.find(id='readme')
    base_path = '%s.html' % metadata.name

    def handle_each_section(section_header, section_type, entry_header, namespace):
        for tag in section_header.next_siblings:
            # not all siblings are tags
            if not hasattr(tag, 'name'):
                continue
            if tag.name == section_header.name:
                break
            if tag.name == entry_header:
                api_name = next(tag.stripped_strings)
                tag_anchor = next(tag.children)
                entry_path = base_path + tag_anchor['href']
                entries.append(Entry(
                    name=api_name, type=section_type, path=entry_path))
                # insert an anchor to support table of contents
                # and more ...

    if metadata.name == 'lua-resty-websocket':
        for section in metadata.sections:
            section_path = section.replace('.', '')
            entries.append(Entry(
                name=section, type='Class', path=base_path + '#' + section_path))
            section_header = soup.find(
                id=('user-content-' + section_path)).parent
            handle_each_section(section_header, 'Method', 'h4', section)
    else:
        for section in metadata.sections:
            section_type = get_type(section)
            section_header = soup.find(id=('user-content-' + section)).parent
            # all entries' header is one level lower than section's header
            entry_header = 'h' + str(int(section_header.name[1]) + 1)
            handle_each_section(
                section_header, section_type, entry_header, metadata.name)

    # remove user-content- to enable fragment href
    start_from = len('user-content-')
    for anchor in soup.findAll('a'):
        if 'id' in anchor.attrs:
            anchor['id'] = anchor['id'][start_from:]

以下均为可选步骤：

给每页内容添加条目划分。dash会从HTML源文件中提取格式为<a name="//apple_ref/cpp/Entry Type/Entry Name" class="dashAnchor"></a>的标签，作为显示在左下角的条目划分的依据。Entry Type表示该条目的type，如前面提到的Function。而Entry Name里的内容将显示在条目划分中。 注意Entry Name的值需要经过URL编码处理 。然后往前面的Info.plist添加这两行：

<key>DashDocSetFamily</key>
<string>dashtoc</string>

对Info.plist的修改需要在重新加载docset后才能生效。另外，zeal也有相似的条目划分功能，不过它的实现跟dash不同。zeal会使用当前页面的路径，查询以该路径开头的其他条目。不过值得注意的是，zeal这一实现存在个问题：如果条目的路径以./开头，zeal是查询不到的（因为“当前页面的路径”不包含./）。（做OpenResty的docset时，我一直搞不懂为什么条目划分没有生效，直到阅读了zeal的实现源码才知道是这么一回事）

# insert an anchor to support table of contents
anchor = soup.new_tag('a')
anchor['name'] = '//apple_ref/cpp/%s/%s' % (section_type, quote(api_name))
anchor['class'] = 'dashAnchor'
tag_anchor.insert_before(anchor)

添加图标到<docset_name>.docset/icon.png。图标规格最好是32X32，你也可以准备两个图标文件，一个是icon.png（16x16），另一个是[email protected]（32x32）。后者将会用在Retina屏幕上。
添加文档重定向支持。为了让用户能够打开在线文档，你可以在docset里添加文档重定向的功能。有两种办法：
1. 在Info.plist添加
```
<key>DashDocSetFallbackURL</key>
<string>$baseURL</string>
```
```
其中的`$baseURL`为在线文档的入口地址。
```
1. 在每个HTML文件的<html>旁添加源地址的注释，像这样：<html>。
启用JavaScript。Javascript默认是禁用的，要想启用，要在Info.plist中添加<key>isJavaScriptEnabled</key><true/>
添加docset的主页。当你点击某个docset时，它会尝试显示一个主页。你需要在Info.plist添加主页的路径：

<key>dashIndexFilePath</key>
<string>$PATH</string>

其中$PATH为主页相对于<docset_name>.docset/Contents/Resources/Documents/的路径，如api.jquery.com/index.html。

如果你乐意，可以把生成的docset提交给Dash官方平台。详细的步骤和要求见 https://github.com/Kapeli/Dash-User-Contributions#contribute-a-new-docset

生成OpenResty docset的完整代码见github。

Feb 29 '16 10:02 spacewander