user-documentation Examples usability

It would be good if we could:

allow examples in code blocks in .md files
automatically generate the *expect* files

One potential flow would be:

I create a pull request - fredemmott/fix-foo-branch -> hhvm/master
Travis-CI creates a hhvm-travis-bot/pr1234-examples -> fredemmott/fix-foo-branch pull request with the expect files

This would also require auto-generation of standard boilerplate, like the entrypoint function name.

It could either remove the example and add @@ extracted_inline123.hack @@, or it could add guides/foo/bar/.../extracted_inline123.{hack,.hack.hhvm.expect} etc, and we add extracted_inline*.hack to .gitignore

Nov 06 '19 23:11 fredemmott

fbmarkdown provides an AST, so we don't need regexps etc to find the code blocks :)

Nov 06 '19 23:11 fredemmott

I'm looking into this now; I like the idea of extracting the examples from the .md files -- ideally, this wouldn't just be the example code, but everything else related to each example too (whichever of expectf, expectre, hhconfig is needed -- not including regular .expect files as those can be generated automatically), solving #819 at the same time. I'm thinking an example format like this:

Common case

# some-name.hack
echo "Hello, world!\n";

This would be all you need to write for a basic example and it will be run and tested.

During bin/build.php, this extracts some-name.hack, runs it through HHVM and puts the output in some-name.hack.hhvm.expect (it can also generate an empty hhconfig, typechecker-expect or whatever else the test runner requires).

I'm not sure whether some-name.hack should be committed and pushed, but some-name.hack.hhvm.expect definitely should because

it would be very expensive to generate the output for each example during bin/build.php
it allows the author and reviewers to verify the output is as expected
if the output unexpectedly changes, we will catch it, instead of silently generating a new, bad output file

In the rendered output:

the # comment is removed
example output is added
if the example doesn't contain <<__EntryPoint>>, we add the boilerplate automatically
whether or not it contains an <<__EntryPoint>>, in either case we can also automatically add a namespace line (unique for each example, probably?), use namespace for any HSL functions used, \init_docs_autoloader()

Specifying a name for each example is important because:

it allows us to identify the relevant .hhvm.expect file even if the example code changes (an alternative would be to extract each example into a file named e.g. hash of the example code, but that would disassociate the example with its previous .expect file whenever the code changes)
it's not necessary for this name to be in the form of a file name (some-name.hack), but it's a convenient way to specify whether it's .hack or .hackpartial (or possibly .php if we want some legacy examples); also it probably makes it more obvious to anyone reading the code
doesn't really matter if we use # or // for these special comments but it seemed convenient to use # since we don't normally use it for other comments (it could also be anything else that's not a valid Hack comment because these are removed during extraction)
it also allows us to distinguish "real" examples that should be run and tested from other random pieces of example code

// Random example code that won't be extracted.
$foo = await bar();

Uncommon cases

The example block can include other fake "files", e.g. if an example has non-deterministic output and therefore needs an .expectf file and/or needs an .hhconfig flag:

# example2.hack
echo 'Your lucky number is: '.mt_rand(0, 99);
# .expectf
Your lucky number is %d
# .hhconfig
some_required_flag=true

We might also want to support some magic strings to control the output:

# @type-errors
# example-type-error.hack
3 + vec[42]

will generate example-type-error.hack.type-errors instead of example-type-error.hack, runs it throught the type checker and saves the output to example-type-error.hack.typechecker.expect

# @no-auto-output would avoid including the example output in the generated HTML, etc.

We probably don't need @example-start and @example-end anymore since this proposal changes it such that you write exactly the code you want to be visible in the guide, and we generate whatever boilerplate is missing (compared to current version: you write a fully working script and we strip boilerplate).

Does this look reasonable? Hopefully should address both this and #819, and we can also address other minor issues during the conversion (use hack/hackpartial instead of php; make sure each example is in a unique namespace (e.g. XHP examples) -- or maybe unique per guide page?; get rid of require/include in favor of the autoloader; ...)

Sep 09 '20 20:09 jjergus

expectf

This can also be autogenerated - see out2expectf.py in the HHVM tree

# vs //

No strong preference, but we should verify that we can make examples that include #! lines at the top

it also allows us to distinguish "real" examples that should be run and tested from other random pieces of example code

Once real examples are more convenient, we should get must stricter about banning this. Perhaps even 'no code blocks unless sh' or similar - only inline code fragments?

configuration format

GFM (and fbmarkdown) permits more detailed 'info strings' after the leading separator and notes that the first word 'typically' indicates the language; a more conventional way to provide extra data here would be along the lines of:

```hack filename=foo.hack
herpderp();
```

... which GitHub renders as:

herpderp();

Or to go a bit further:

```hack foo.hack no-auto-expect no-auto-output
herpderp();
```

```hhconfig foo.hack
not_rendered_by_default();
```

```expectf foo.hack
%d
```

This would:

be easier to parse
not need removing from output
be easy to check for unrecognized options (e.g. make typos build failures)
probably be a little less user-friendly than comments

Sep 10 '20 20:09 fredemmott

it's not necessary for this name to be in the form of a file name (some-name.hack), but it's a convenient way to specify whether it's .hack or .hackpartial (or possibly .php if we want some legacy examples); also it probably makes it more obvious to anyone reading the code

I think we want the full filename: if an error mentions some-name.hack, github search/grep/find should be able to find the source

Sep 10 '20 20:09 fredemmott

This can also be autogenerated - see out2expectf.py in the HHVM tree

Good idea, it doesn't look like it can handle everything but I like it for handling common cases like file paths (IIRC most of our expectf files are because we expect HHVM to raise an error which contains the path). We should still allow people to override it.

Once real examples are more convenient, we should get must stricter about banning this. Perhaps even 'no code blocks unless sh' or similar - only inline code fragments?

I wouldn't ban it, I think the "not real" code blocks are very useful. As part of explaining any feature, we usually want to show short snippets (usually one line of code, often an expression, not even a statement), before showing some full examples at the end. These would be significantly degraded by any extra boilerplate (echo statements etc.).

We could maybe make the real code blocks default and require explicit opt-out, but I don't see a good way around having to specify a name for each real example (we need a name for the extracted files and to keep track of which files belong to which example even if the example and/or its output changes). With that, it made more sense to me to just say "example with a name -- extract and run, example without a name -- not extracted".

``hack foo.hack no-auto-expect no-auto-output

I like this, would definitely use it for the example filename + boolean flags. I think I'd still prefer to keep the extra files in the same block though, something like

```foo.hack no-auto-expect no-auto-output
herpderp();

#.hhconfig
not_rendered_by_default();

#.expectf
%d

It's a bit shorter and no chance of typos (```hhconfig foooo.hack). We also need to process these all at once (e.g. see which files are overridden and shouldn't be auto-generated) so the implementation will likely be easier if it's all 1 markdown AST node.

Sep 10 '20 20:09 jjergus

I wouldn't ban it, I think the "not real" code blocks are very useful. As part of explaining any feature, we usually want to show short snippets (usually one line of code, often an expression, not even a statement), before showing some full examples at the end. These would be significantly degraded by any extra boilerplate (echo statements etc.).

Perhaps we should add typecheck-only snippet? My biggest objection to raw code blocks is we have no automatic detection when they get outdated.

As an 'escape hatch', perhaps a dedicated, easily-greppable-for-review marker, e.g.:

```untested-hack
foo();
```

I like this, would definitely use it for the example filename + boolean flags. I think I'd still prefer to keep the extra files in the same block though, something like

I'd prefer a marker that we can be confident isn't/won't be valid hack - perhaps:

herpderp()

?><?hhconfig
foo=bar
?><?expectf
%d

Sep 10 '20 20:09 fredemmott

user-documentation user-documentation copied to clipboard

Examples usability

Common case

Uncommon cases

user-documentation
user-documentation copied to clipboard