rouge
rouge copied to clipboard
ConsoleLexer detecting prompt based on characters in the middle of a line
Name of the lexer ConsoleLexer
Code sample
require 'rouge'
console = <<~TXT
$ echo "Hello > World"
"Hello > World"
TXT
lexer = Rouge::Lexers::ConsoleLexer.new
formatter = Rouge::Formatters::HTMLLegacy.new
puts formatter.format(lexer.lex(console))
Additional context
When using the ConsoleLexer, I expect the prompt to be detected based on characters at the start of a line. So in the example above, only the $
at the beginning of the $ echo "Hello > World"
line would indicate this line as a prompt. Instead the second line "Hello > World"
is also matched as a prompt because of the >
character in the middle of the line. So the code above produces this output:
<div class="highlight">
<pre class="codehilite">
<code>
<span class="gp">$</span><span class="w"> </span><span class="nb">echo</span><span class="s2">"Hello > World"</span>
<!-- this line should not be marked as a prompt, but it is -->
<span class="gp">"Hello ></span><span class="w"> </span>World<span class="s2">"</span>
</code>
</pre>
</div>
I expect this output instead:
<div class="highlight">
<pre class="codehilite">
<code>
<span class="gp">$</span><span class="w"> </span><span class="nb">echo</span><span class="s2">"Hello > World"</span>
<span class="go">"Hello > World"</span>
</code>
</pre>
</div>
It looks like this method is the culprit, and the Regex needs to be modified to detect prompt characters at the beginning of a line only:
# lib/rouge/lexers/console.rb
def prompt_regex
@prompt_regex ||= begin
/^#{prompt_prefix_regex}(?:#{end_chars.map(&Regexp.method(:escape)).join('|')})/
end
end
I was having a similar problem on my Jekyll blog. A few minutes of hacking at my local copy of the gem source and I came of with a simple solution.
By modifying the prompt_prefix_regexp
so it must be either empty or start with a non-space character — that is, it ignores lines that start with one or more whitespace characters — you can explicitly mark your .go
lines by indenting them like this:
```
$ echo "Hello > World"
"Hello > World" some
```
This does result in the .go
lines being indented in the output as well; but since the whole line will be marked as .go
, you can fix that visually with a css style such as
.go { text-indent: -2ch; } // Undo a 2-character indentation of output lines in your code fence.
Or just embrace having all your .go
output lines indented.
I need some more time to look for edge cases, but I can submit a pull request later this week if there's still interest.
An easier solution might be to modify the prompt_prefix
yourself, to exclude >
. As an example using the Markdown plugin:
``` console?prompt=$
$ echo "Hello > World"
"Hello > World"
```
In your example if you're doing it by hand you could use Rouge::Lexers::ConsoleLexer.new(prompt: '$')
instead.
from the console:
Here are all the options for the console lexer, documented in rougify list
:
console: A generic lexer for shell sessions. Accepts ?lang and ?output lexer options, a ?prompt option, ?comments to enable # comments, and ?error to handle error messages. [aliases: terminal,shell_session,shell-session]
?comments= enable hash-comments at the start of a line - otherwise interpreted as a prompt. (default: false, implied by ?prompt not containing `#`)
?error= comma-separated list of strings that indicate the start of an error message
?lang= the shell language to lex (default: shell)
?output= the output language (default: plaintext?token=Generic.Output)
?prompt= comma-separated list of strings that indicate the end of a prompt. (default:$,#,>,;)