containers
containers copied to clipboard
Offer Int32Map and Int64Map
Someone suggested this weekend that we should really offer both 32-bit and 64-bit maps regardless of architecture, and I thoroughly agree. The same applies to IntSet, of course.
I don't think there is much benefit in 32 bit maps.
- On 32bit they behave exactly the same as IntMap.
- On 64bit they might have a tiny perf benefit through smaller instruction encodings. But that is unlikely to be worth the cost of having one more map type unless one uses backpack or similar to implement them. But even then it's possible the added code size makes up for any benefit in this area.
For Int64 it looks different though, since we can't use a IntMap on 32bit platforms.
If that approach is deemed acceptable I might add a Int64Map (by taking the IntMap implementation and changing the types). It looks like we will need one for GHC in the near future and it would be good to be able to use containers for this.
PS: If there is a strong demand for a Int32Map and it's shown to be worth the cost it might still be good. I just personally don't think it will have enough of a benefit to look into that myself.
I've converted IntSet
and IntMap
to Word64Set
and Word64Map
as part of this GHC MR:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10568
It would require some polishing before upstreaming it. I might have time for it soon, but I'd gladly hand it over if there are other volunteers.
I would definitely want these, but how can we do it without increasing the maintenance burden too much?
That depends on what you consider to be too much. The simplest way to upstream these is to just copy the files, but I get the impression you consider adding ~7000 lines of code as adding too much burden, which is a fair point.
But I don't see any easy solutions. The code is very similar in many places, but the types are pretty much all different.
I could imagine making a poor man's backpack by using something like #define SET IntSet
or #define SET Word64Set
to generate two modules from the same file. But I don't know if I would consider that much easier to maintain.
Just throwing an idea around: once our GHC lower bound rises high enough, we'll be able to have multiple "libraries" in the package. Will this give us enough power to have the necessary type definitions in a module that has different contents for the different libraries?
To be clear, my objection isn't the amount of code, per se, but the duplication. Having to make every change in triplicate is pretty darn annoying. The set/map split is bad enough.
@Bodigrim suggested we use CPP following other boot libraries like filepath
. That seems like a promising approach. Concretely filepath
defines System.OsPath like this:
{-# LANGUAGE CPP #-}
#define FILEPATH_NAME OsPath
#define OSSTRING_NAME OsString
#define WORD_NAME OsChar
...
#include "OsPath/Common.hs"
And then in OsPath/Common.hs there is code like this:
...
splitSearchPath :: OSSTRING_NAME -> [FILEPATH_NAME]
splitSearchPath (OSSTRING_NAME x) = fmap OSSTRING_NAME . C.splitSearchPath $ x
...
How does that affect source links in the Haddocks? I would expect it to destroy them, which is ... not great.
Yes, that seems like a disadvantage, e.g. this link doesn't point to anything useful: https://hackage.haskell.org/package/filepath-1.4.100.3/docs/src/System.OsPath.Posix.html#extSeparator
Ugh. Using the (long-standing) private library support is enough for what I was talking about (no need for the new multiple public libraries), but there's a big problem: the names of the modules, and more importantly the names of the types won't work out right without some CPP. I don't know just how bad that is in practice; probably better than what filepath
does.