hylo icon indicating copy to clipboard operation
hylo copied to clipboard

Consider using ordered data structures only

Open kyouko-taiga opened this issue 9 months ago • 3 comments

It is typically desirable for a compiler to produce the same output for a given input across multiple compilation runs. Some even advocate that two runs with the same input should always produce bitwise identical outputs.

We get close in hc because mangling produces predictable names and because the order in which we insert definitions in LLVM modules is based on the order in which the declarations were parsed. So while it's difficult to "predict" which function will appear first in the generated output, at the very least it is stable.

But if the LLVM output is ordered, none of the previous phases' artifacts are. One reason is that we use many sets and dictionaries. I think we should revisit this choice. In particular, I think it would be nice if a compilation run was guaranteed to take the same execution path every time. That's a good property for debugging.

I guess the only way to achieve that is to use ordered data structures everywhere. That will probably add some space and time costs, though, so we could also consider using simple arrays in some places. That's because many of the sets we use are so small that maintaining a hash table is probably more expensive than doing linear searches. Of course we'd need to profile this wild assumption.

kyouko-taiga avatar May 23 '24 17:05 kyouko-taiga