tabularray icon indicating copy to clipboard operation
tabularray copied to clipboard

Make tabularray and other packages run faster

Open lvjr opened this issue 2 years ago • 6 comments

There have been some complaints from users that tabularray ran very slow in building long tables. The main reason is that tabularray need to read and write plenty of key-value pairs for making high quality tables, but TeX engines don't provide a native data structure for doing this kind of jobs, which lag behind other modern programming languages a lot.

If TeX engines (pdfTeX, XeTeX, LuaTeX, etc, not including Knuth TeX) could add this new data structure, then tabularray would run much faster. Actually, LaTeX team could easily reimplement property list in expl3 to make other LaTeX3 packages run faster too. Also other packages which use lots of key-value pairs, such as PGF/TikZ, could possibly benefit from this new data structure.

Therefore I believe it is a good idea to add this new data structure to TeX engines. But I don't expect it will happen soon. Although not many people are using tabularray package at present, I still want to open this issue now to know what the users think.

  • If you know well how TeX engines are written, do you think it is doable to add this new data structure?
  • If you are a friend of some TeX engine developer, would you help to persuade him/her to add this new data structure?
  • If you are an ordinary user, would you sponsor TeX engine developers with a little money for adding this new data structure?

lvjr avatar Jul 02 '22 12:07 lvjr

prop确实慢,如果需要的是属性偏移固定的数据结构,可以用intarray,要求不那么高可以把对象和属性拼成一个名字来。 即使引擎直接实现,求值一个属性还是要哈希一次,跟从变量名计算token在eqtb的位置是一样的,也就是省点内存。 话说吕老师考虑在luatex下用lua重新实现吗?

SainoNamkho avatar Jul 02 '22 17:07 SainoNamkho

I doubt there would be much enthusiasm to extend tex in this way. For classic tex (and pdftex) there is a very conservative approach to extensions and in luatex as already hinted by the comment above it would have a very large overlap with Lua tables.

Also I'm surprised that property lists have a major effect here (compared with the time spent doing the typesetting trials) prop lists seq etc are good for handling the initial setup and configuration options, but by the time you get to the trials can't you have compressed everything basically to information in csnames so just limited by the csname hash lookup (which isn't brilliant but isn't that bad in practice) l3 regex isn't fast either.

davidcarlisle avatar Jul 02 '22 18:07 davidcarlisle

@SainoNamkho Actually, intarray is already used in tabularray. For rewriting some code with lua, see issue #250.

lvjr avatar Jul 03 '22 13:07 lvjr

Also I'm surprised that property lists have a major effect here (compared with the time spent doing the typesetting trials) prop lists seq etc are good for handling the initial setup and configuration options, but by the time you get to the trials can't you have compressed everything basically to information in csnames so just limited by the csname hash lookup (which isn't brilliant but isn't that bad in practice) l3 regex isn't fast either.

@davidcarlisle When tabularray is doing one round of typesetting trial, it measures every cell, and need to read and write lots of key-values. Therefore I am not sure how to compare two things when one is part of the other.

If some one asks me how to prove that key-value reading and writing is the main culprit for the slowness, I don't know how to answer it, because I don't know how to compare current data structure with a nonexistent data structure. But I did do a test last year:

  • with only property lists: 36.9s
  • with only token lists: 5.0s
  • with token lists + intarrays: 3.6s

By replacing some token lists with intarrays, the test saved 28% of running time. Just like a lua table, I expect the new data structure could also be used as an array with similar performance of an intarray.

lvjr avatar Jul 03 '22 13:07 lvjr

I noticed that the package has replaced some "spec list" (hash table version) by intarray (array version) but prop lists seems not changed. It might help(?) if all prop lists got changed by intarray. intarray, like a table, hashes a key, token lists hashes the concatenation of key and value. Prop list is somewhat like javascript's Object.entries (but the expansion costs more to build linked lists), the query operation is like

let a = [["key1", "val1"], ["key2", "val2"], ["key3", "val3"]]
function query(prop_list, key) {
    for (let [k, v] in prop_list) {
        if (k === key)
            return v;
    }
    return null;
}
query(a, "key2");

I guess it is designed to avoid unnecessary hash table entry creation.

SainoNamkho avatar Jul 03 '22 14:07 SainoNamkho

@SainoNamkho intarrays in expl3 can only store integers or dimensions.

lvjr avatar Jul 06 '22 03:07 lvjr

Forget it. LuaTeX is the future (see issue #250).

lvjr avatar Nov 30 '22 14:11 lvjr