solang Support parallel compilation of all input files

Input files can be compiled in parallel.

Implement this for solang compile
Occurances of parallel solang compile in our CI jobs are no longer needed

Jun 23 '23 07:06 xermicus

I think there are few things that are needed.

The FileResolver should be wrapped in a std::sync::RwLock
Optionally the parse tree should be cached in the FileResolver, so we don't waste cycle reparsing the same file (e.g. files that are imported)
Files should be processed in a thread worker pool fashion

Jun 23 '23 12:06 seanyoung

I think this issue is a bit more difficult than it looks like. If one file depends on the other, they cannot be built in parallel, due to dependency resolution in sema. At least, the parser and the lever can run in parallel.

Jun 26 '23 21:06 LucasSte

I think this issue is a bit more difficult than it looks like. If one file depends on the other, they cannot be built in parallel, due to dependency resolution in sema. At least, the parser and the lever can run in parallel.

I don't understand what you mean. What do you see as a problem?

Jun 27 '23 07:06 seanyoung

@seanyoung Consider this case:

file A.sol:

contract A { ... }

file B.sol:

contract B is A { ... }

file C.sol

contract C {
   A other;
   function foo(address addr) external {
        other = new A{address: addr}();
   }
}

I can invoke Solang using solang compile --target Solana A.sol B.sol C.sol File B.sol depends on A.sol. The semantic analysis can only happen for B after that contract A is fully resolved, even though they might generate different binaries. Parallel compilation for A and B is not possible.

For file C.sol, the contract needs to have contract A resolved. In addition, the Solana account collection in codegen expects the CFG from all contracts to be ready in order to collect accounts for function foo. Parallel compilation for C and A is not possible again.

The way I see, we either can enable parallel compilation and let the compiler do repeated work for these cases (e.g. resolve A.sol solely for B.sol in one thread to generate B's binary, while A.sol is building in another thread to generate A's binary), or we need to construct a dependency tree to identify what can be parallelized and use many synchronization mechanisms throughout the code to make this work.

Jun 27 '23 13:06 LucasSte

File B.sol depends on A.sol. The semantic analysis can only happen for B after that contract A is fully resolved, even though they might generate different binaries.

This is not how Solang works and it could never work that way.

Each file on the command line is new Namespace. When a file is imported, we call sema (recursively) with the existing namespace and then walk the parse tree of the imported file. So, the parse tree for the same file can be used concurrently in different threads.

You are suggesting that when B.sol imports A.sol, then it uses the Namespace of A.sol rather than the parse tree. That would be wrong and will lead to incorrect compilation. Each import needs to go through sema for its own Namespace.

There are global things like user defined types which could have different definitions in different files. When you then import another file, that imported file needs to use the correct global definitions.

So, sema and the following stages can run in parallel. Since we're using an lalr grammar, the parser stage should be pretty fast so I suspect this will make little difference.

Jun 27 '23 13:06 seanyoung

So, sema and the following stages can run in parallel. Since we're using an lalr grammar, the parser stage should be pretty fast so I suspect this will make little difference.

I apologize. I wasn't aware that Solang worked that way. By building both A.sol and B.sol, the compiler is doing repeated work resolving contract A, isn't it? Shouldn't we resolve A only once?

Jun 27 '23 17:06 LucasSte