HVM
HVM copied to clipboard
Improvements needed, and planned features
Better allocator
Currently, the C allocator just reserves 8 GB of memory when the program starts, and, if there are multiple threads, that memory is split between then. So, for example, if there are 8 threads, each one gets 1 GB to work with. A worker can read from other worker's memories (which may happen for shared data, i.e., when they pass through a DP0/DP1 pointer), but only the owner of the slice can allocate memory from it. Each worker also keeps its own freelist.
As such, a huge improvement would be thread-safe, global, possibly arena-based, alloc and free.
Better scheduler
Right now, the task scheduler will just partition threads evenly among the normal form of the result. So, for example, if the result is (Pair A B)
, and there are 8 threads, 4 will be assigned to reduce A and B. This is obviously very limited: if A reduces quickly, it won't re-assign the threads to help reducing B, so the program will just use half of the cores from that point. And so on. This is why algorithms that return lists are often slower, they aren't using threads at all. In the worst case, it will just fallback to being single threaded.
A huge improvement would be a proper scheduler, that adds potential redexes to a task pool. When I attempted to do that, the cost of synchronization added too much overhead, ruining the performance. Perhaps a heuristic to consider would be to limit the depth of the redexes for which global tasks are emitted; anything below, say, depth=8, would just be reduced by the thread that reaches it. Many improvements can be done, though.
I32, I64, U64, F32, F64
Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.
On #81.
Improve the left-hand side flattener
On #54.
A nice API to use as a Rust library
Using HVM as a pure Rust lib inside other projects should be very easy, specially considering the interpreter has a fairly good performance (only about 50% slower than the single-thread compiled version). We must think of the right API to expose these features in a Rust-idiomatic, future-proof way. Input from the Rust community is appreciated.
A JIT compiler
Right now, HVM compiles to C. That means a C compiler like clang or gcc is necessary to produce executables, which means that it isn't portable when used as a lib. Of course, libs can just use the interpreter, but it is ~3x slower than the compiler, and is not parallel. Ideally, we'd instead JIT-compile HVM files using WASM or perhaps Cranelift, allowing lib users to enjoy the full speed of HVM in a portable way.
IO
Adding IO should be easy: just write an alternative version of normalize
that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.
Windows support
The compiler doesn't support Windows yet. The use of -lpthreads
may be an issue. The interpreter should work fine, though.
On #52.
GPU compilers
Compiling to the GPU would be amazing, and I see no reason this shouldn't be possible. The only complications I can think of are:
-
How do we alloc inside a GPU work unit?
-
Reduce() is highly branching, so this may destroy the performance. But it can be altered to group tasks by categories, and merge all branches in one go. The reduce function essentially does this:
branch -> match -> alloc -> subst -> free
. The last 4 steps are very similar independent of the branch it falls into. So, instead, the branch part can just prepare some data, and thematch -> alloc -> subst -> free
pass are all moved to the same branch, just with different indices and parameters. This will avoid thread divergence. -
GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.
Compile Kind to HVM
HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level, but in the sense that users shouldn't be writing it directly. Ideally, Kind will compile to HVM, but it needs a lot of adjustments before that is possible.
Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule
See this issue for more information, as well as the superposed duplication section on HOW.md.
On #61.
Tests
This project has basically no test. Write tests!
Will edit the thread as I think of more things. Feel free to add to this thread.
Regarding GPU compilers and generally "offloading of some parts of the computation" you might have a look at DawnCC - some years ago I found their work quite inspiring (btw. they seem to be from Brazil, so might be more accessible for you in some way or another).
A JIT compiler
Adopting WASM opens lots of capabilities and possibilities and currently, it's a blue ocean for new languages and attracts attention.
Windows support
The priority should be very low in the current stage, so please avoid putting your precious resource into it right now.
GPU compilers
GPU compilers also can attract attention in terms of Machine Learning since I observe ML finally has entered the true realm of functional programming.
https://github.com/google/jax
JAX is Autograd and XLA, brought together for high-performance machine learning research.
JAX transformations only work on pure functions, which don't have side-effects and respect referential transparency (i.e. object identity testing with is isn't preserved). If you use a JAX transformation on an impure Python function, you might see an error like Exception: Can't lift Traced... or Exception: Different traces at same level.
JAX now runs on Cloud TPUs. To try out the preview, see the Cloud TPU Colabs.
At least for ML, nowadays, the local physical GPU device usage is rather outdated. The recent ML researchers actively take advantage of Cloud TPUs, so you may also consider this movement.
Compile Kind to HVM
HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level
I've noticed that and in fact
HVM files look like untyped Haskell.
HVM looks very clean and mostly has no "statements" and I like better than Haskell and even better than Kind.
My proposal is not to compile Kind to HVM but extends HVM itself.
I suppose it is a simpler approach then should be far easier to extend HVM by setting up the suited syntax specially designed for the underlying HVM from scratch.
HVM can have a simple module system to extend itself. It's like a standard library + procedural-macro in Rust.
Procedural macros allow you to run code at compile time that operates over Rust syntax, both consuming and producing Rust syntax. You can sort of think of procedural macros as functions from an AST to another AST. Procedural macros must be defined in a crate with the crate type of proc-macro.
Since HVM concise syntax itself is just AST, this approach should fit the HVM module system.
For instance,
HVM files look like untyped Haskell.
So type syntax is required on top of the native HVM then a static type checker runs. There is a good chance to implement first-class types like Idris.
Perhaps we need more expansion of pattern-matching syntax or IMO, infix functions/ combinators/operators definition.
All are module base extensions, so an incremental approach to the new perfect language.
(EDIT) I think this AST to another AST approach (as same as Procedural macros in Rust) has been already done and working because HVM code can be already transpiled to C code. The difference of self-expansion module is
- HVM -> C
- Additional syntax on HVM -> the base HVM
Targeting GPUs
It might be worth taking a look at Futhark, which is a small language for writing code that can target CUDA and OpenCL. (There's an open issue for WebGPU support, though it hasn't received any attention) Futhark can compile to a C library; I haven't looked at HVM's compilation output in detail yet, but I'm guessing that GPU-targeting code could compile to C that calls a C library (compiled from Furthark) for using the GPU.
C compiler/portability
Zig has done some extensive work on bundling Clang for easily setting up a C compiler; maybe there's some work from there which could be reused?
IO
Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.
I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named normalize
via grep.
GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.
I'm only dabbling in this topic, so it's possible that it may not be suited at all; but this paragraph is conspicuously lacking any mention of Vulkan.
For GPGPU: This is not really a "clean" approach, but you could try targetting Futhark rather than a concrete API. Futhark has a support for a lot of different backends, and would ensure wide support.
For JIT compiler part: I disagree that it's not portable currently, things like tinycc exist, not to mention that libgccjit is also a thing. However, I think targetting Cranelift would be very useful. You'd have to redo how you do the whole runtime system I think? The current approach of having a template C file is ugly IMO anyway.
What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lots of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in principle?
IO
Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.
I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named
normalize
via grep.
Here is the relevant function on Rust. Similar one on C. It is quite small, as it just recurses over a term, applying reduce() layer by layer, to compute the normal form. To integrate IO, we'd just modify that function like this:
let term = reduce(mem, funcs, host, opt_id_to_name);
match term {
... instead of checking if it is an app / lam / etc, it will match against
... the expected shape of the IO monad... that is, one of these:
... (IO_Ask method param continuation)
... (IO_End return_value)
... and then call the selected IO method and pass the result to the continuation
}
Basically it would do the same as run_io here, preferably using Kind's IO Type.
(Will answer the remaining questions later.)
@rowanG077
What about compiling Haskell to HVM?
There is also PureScript, which supports alternate backends.
Out of curiosity, why is there both a runtime.c and a runtime.rs? They appear to be extremely similar. Is the rust one the interpreter and the C one for linking when compiling?
@jmorag Yes, runtime.rs is the code used for the interpreter and runtime.c is the template of boilerplate code used as the runtime by C programs generated by the compiler.
What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?
Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM
, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.
There is also PureScript, which supports alternate backends.
Would be amazing to compile PureScript to HVM (and benchmark!).
What about compiling Haskell to HVM?
Just note. The author(@VictorTaelin) already has developed a better language: Kind
A minimal, efficient and practical programming language that aims to rethink functional programming from the scratch, and make it right. Under the hoods, it is basically Haskell, except without historical mistakes, and with a modern, consistent design.
wouldn't this be possible in principle?
Yes, possible; however, I suppose the priority for the author is currently low for the above reason, and such a project should be independently done by the Haskell community with support from the author. It's not a small work and I hope the resource will be primarily provided by the Haskell community itself.
I guess the only reason the demo code is compared to the Haskell code is simply since Haskell is more well-known than Kind so that broader functional community users can easily reach the concept, then GHC performance benchmark.
Although I think Kind has a better design than Haskell, I still think HVM is better than Kind and hope HVM can be extended with the compositional modules.
Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule
It seems that we need the type-chcker based on HVM in any way, so I think it's a simpler integrated approach toward the typed-HVM.
What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?
Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in,
Haskell -> GHC Core -> HVM
, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.
Perhaps Haskell -> Core -> STG -> HVM
would be possible, as then HVM could be bolted onto the external stg interpreter which has already done a ton of work re-implementing the GHC primops and emitting whole-program compilation information so that HVM wouldn't have to also become an incremental compiler.
What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?
Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in,
Haskell -> GHC Core -> HVM
, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.There is also PureScript, which supports alternate backends.
Would be amazing to compile PureScript to HVM (and benchmark!).
Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this.
@rowanG077 I'm on the PureScript core team and I recently supervised another thesis on a PureScript backend. I'm excited about HVM and I'd be available to provide pointers (on the PureScript side) to help get this going - feel free to reach out if you'd like to chat about this :slightly_smiling_face: (see my profile for contact details)
My understanding of the HVM runtime is that it's lazy. How would strict languages like PureScript and Idris retain their semantics when compiling to HVM?
@jmorag I don't know in details, but I know that purenix works in a scenario like that, in this section it explain the strictness difference impact on purenix usage.
C compiler
I just tried tinycc
on the output from compiling bench/TreeSum/main.hvm
; C compilation fails due to at least two issues:
- lack of
<stdatomic.h>
-
initializer element is not constant
in the declaration ofHEAP_SIZE
(I'm not sure why this is)
There may be other issues, I didn't investigate further. I think these rule out embedding tinycc
as a portable C compiler, though.
C compiler
Thanks @DylanSp for trying tcc
out!
I just tried
tinycc
on the output from compilingbench/TreeSum/main.hvm
; C compilation fails due to at least two issues:
- lack of
<stdatomic.h>
This is a totally minor issue. See e.g. how V (which uses tcc
by default - i.e. in non--prod
mode) solves this: https://github.com/vlang/v/tree/master/thirdparty/stdatomic and https://github.com/vlang/v/tree/master/vlib/sync/stdatomic .
initializer element is not constant
in the declaration ofHEAP_SIZE
(I'm not sure why this is)
Sounds like something minor to me.
There may be other issues, I didn't investigate further. I think these rule out embedding
tinycc
as a portable C compiler, though.
So I'm convinced that if multiplatform quick compilation is desired (IMHO it is, but I'm not the one to decide), tcc
is the best way forward.
STG -> HVM might not be feasible, because, well, multiple reasons. The main problem is that GHC doesn't care about Dup
-explosion prevention, well, until Linear Haskell is a thing - and even then.
@rowanG077 sorry for the delay! Yes, feel free to mail me ([email protected]) or DM me in any of my social networks. It would be an honor if you did your thesis related to HVM, and certainly there are many places where such a thing would add a lot of value to it.
I forgot to mention in terms of Windows support #52
Nowadays, they no longer use Cygwin. They simply use a real Linux kernel on Windows that is called Windows Subsystem for Linux WSL2. A full environment of a Linux distribution such as Ubuntu can be installed on WSL2, even GUI applications. https://ubuntu.com/wsl In the latest situation, they can install and launch the Ubuntu terminal with a few clicks.
So I guess we don't have to be concerned about this.
Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this.
FYI, this is an excellent discussion related to GHC you might be interested in. #74
I agree with @stken2050 that HVM is well suited as a general purpose programming language. It just needs really well designed primitives, and over time, a standard library.
I32, I64, U64, F32, F64
Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.
I'd be interested in adding support for at least the unboxed numeric types as this seems like a low hanging fruit. I just wanted to check what the concensus was between implicit/explicit casting and how operators should function in regards to mixed input types. My instinct would just be to cast up to the 'bigger' type but maybe operators should only operate on the same type - combined with explicit casts.
@VictorTaelin what do you think?
Operators should just operate on the same type. So, if F32 is added, for example, ADD will work on either u32/u32 or f32/f32, but not anything else (i.e., no u64/u32, no f32/u32, etc.).
Why not add a type conversion operator for primitive types too? Rust has it, and we could probably just easily use theirs under the hood?
@nothingnesses I think explicit (i.e. written every single time by the programmer) conversion from one type to another will be needed one way or another. I think @VictorTaelin meant that there shouldn't be any implicit conversion as that always leads to some irregularities and confusion.
Ah maybe. I don't think any conversion operators are currently implemented in the language right now though, which was why I brought it up.