pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Lua: let List constructor split string argument on whitespace

Open bpj opened this issue 1 year ago • 2 comments

Describe your proposed improvement and the problem it solves.

A little more DWIMery in the Lua API: let the List constructor when given a string as argument split that string on whitespace, so that for example List("foo bar baz") returns a three-element list {"foo", "bar", "baz"}.

This is useful in the (to me at least) common case where you want to create a list of classes, or when you want to loop over a set list of strings, e.g. e.g. keys of metadata fields to validate.

I'm the first to admit that this is very lazy,[^1] but the class field in a table passed to pandoc.Attr already works like this

Describe alternatives you've considered.

pandoc.Attr({class = string}).classes

works so-so if you already have a string in a variable.

Currently I'm using a function which uses string:gmatch('%S+') but complete with checking whether the argument is already table-ish and the loop around gmatch^2 that's quite a bit of boilerplate in almost every filter I write. (I do have my own utilities library, but when I may be going to share the filter with others I always end up copying the functions I use into the filter file!)

[^1]: I also admit that I'm missing Perl's @array = qw/foo bar baz/ operator and @array = $string =~ /\S+/g construct!

helper.to_list = function(val, pat) if 'table' ~= type(val) then local str = tostring(val) pat = tostring(pat or '%S+') val = { } for s in str:gmatch(pat) do val[#val + 1] = s end end return pandoc.List(val) end `````` As you can see this function does a bit more in that it allows a custom pattern but I'm not asking for that!

bpj avatar Jun 01 '24 17:06 bpj

I wouldn't want to modify pandoc.List, but I'd be open to towards adding a pandoc.text.split function, e.g., by wrapping Data.Text.splitOn.

tarleb avatar Jun 01 '24 19:06 tarleb

Alternative: we could also make it easier to turn the gmatch iterator into a list:

pandoc.List.from_iterator(str:gmatch '%S+')

That could also be used with other iterators like io.lines, file:lines, pairs, etc.

local my_keys = pandoc.List.from_iterator(pairs(my_table))

We could probably overload pandoc.List for extra convenience.

pandoc.List(str:gmatch '%S+')

tarleb avatar Jun 02 '24 13:06 tarleb

Great! 👍The Penlight List class constructor does something similar by calling its iterator generator on anything which isn't a table. That can actually be nasty since a string becomes a list of bytes, which is easy to forget. Taking an iterator as argument is much better!

bpj avatar Sep 21 '24 12:09 bpj