deepseq icon indicating copy to clipboard operation
deepseq copied to clipboard

remove instance NFData (a -> b)

Open amigalemming opened this issue 9 years ago • 83 comments

I suggested to remove the instance NFData (a -> b) because it cannot be implemented properly. I think the proposal got broad support: https://mail.haskell.org/libraries/2016-May/026961.html We have still to decide whether to drop the instance or replace it by an unimplementable one. The latter one would have the advantage that people see that the instance is omitted by intention. We would need a major version bump. I can setup a pull request if you like.

amigalemming avatar May 23 '16 09:05 amigalemming

tbh, I'd like the @haskell/core-libraries-committee to sign off on this.

Personally, I'd prefer an unimplementable instance, as otherwise all it takes is one orphan instance hidden somewhere in a popular package to thwart this change. And even worse, if more than one orphan instance appears, we would risk ending up in an even worse situation than we're now in :-/

(/cc'ing @RyanGlScott as he wasn't yet part of the @haskell/core-libraries-committee github team at time of writing)

hvr avatar May 23 '16 10:05 hvr

I have no objection to removing the instance, but I'm not a fan of the idea of making it unimplementable.

Yes, one popular package could bring it back from the dead, or bring into scope a variant that enumerates the entire domain of a function and memoizes all results, but if it can weather the storm of popular opinion then perhaps it deserves to come back. This scenario is unlikely given the complete absence of interest in the current instance.

We have an open world. Orphan instances are deeply unpopular and anti-modular. If someone needs the instance, let them have it, it won't go far enough upstream to matter.

-Edward

On Mon, May 23, 2016 at 6:29 AM, Herbert Valerio Riedel < [email protected]> wrote:

tbh, I'd like the @haskell/core-libraries-committee https://github.com/orgs/haskell/teams/core-libraries-committee to sign off on this.

Personally, I'd prefer an unimplementable instance, as otherwise all it takes is one orphan instance hidden somewhere in a popular package to thwart this change. And even worse, if more than one orphan instance appears, we would risk ending up in an even worse situation than we're now in :-/

(/cc'ing @RyanGlScott https://github.com/RyanGlScott as he wasn't yet part of @haskell/core-libraries-committee https://github.com/orgs/haskell/teams/core-libraries-committee at time of writing)

— You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub https://github.com/haskell/deepseq/issues/16#issuecomment-220944194

ekmett avatar May 23 '16 10:05 ekmett

On Mon, 23 May 2016, Edward Kmett wrote:

We have an open world. Orphan instances are deeply unpopular and anti-modular. If someone needs the instance, let them have it, it won't go far enough upstream to matter.

There needs to be only one package in any of hundred packages I import defining this instance as an orphan and then this disables my type safety check. I prefer the non-implementable instance and if someone needs the enumerating instance he can implement it for a newtype wrapper of the function type.

amigalemming avatar May 23 '16 11:05 amigalemming

@amigalemming afaik you suggested to implement some warning facility to annotate undesirable/questionable instances; can you think of a variant which would help here?

hvr avatar May 23 '16 12:05 hvr

And you can just as easily implement an orphan instance module that provides the impossible instance with the same head.

class Don'tExportMe where boom :: a instance Don'tExportMe => NFData (a -> b) where rnf = boom

and import that to know that the other isn't being used in your code, because now you'll get a overlapping instance head.

I'd, however, treat whatever package exported this instance as just as questionable as one that provided the other.

NFData provides no guarantees that it "really" does anything. There are lots of such "mildly dangerous instances" that may not be as fully strict as you'd like for NFData. You'll never rule them all out.

For the scenario you fear to come to pass you have to make two lapses in judgment. First you have to depend on a package that provides the instance, and then you have to write code that depends on the instance existing.

That seems well balanced against the fact that someone using criterion might very well want to define the instance for NFData (a -> b) locally in their benchmark suite, regardless of whether or not you believe the instance should exist.

-Edward

On Mon, May 23, 2016 at 7:00 AM, amigalemming [email protected] wrote:

On Mon, 23 May 2016, Edward Kmett wrote:

We have an open world. Orphan instances are deeply unpopular and anti-modular. If someone needs the instance, let them have it, it won't go far enough upstream to matter.

There needs to be only one package in any of hundred packages I import defining this instance as an orphan and then this disables my type safety check. I prefer the non-implementable instance and if someone needs the enumerating instance he can implement it for a newtype wrapper of the function type.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub https://github.com/haskell/deepseq/issues/16#issuecomment-220949948

ekmett avatar May 23 '16 12:05 ekmett

I agree with Edward here, my vote goes with his stance.

On Mon, May 23, 2016, 3:33 PM Edward Kmett [email protected] wrote:

And you can just as easily implement an orphan instance module that provides the impossible instance with the same head.

class Don'tExportMe where boom :: a instance Don'tExportMe => NFData (a -> b) where rnf = boom

and import that to know that the other isn't being used in your code, because now you'll get a overlapping instance head.

I'd, however, treat whatever package exported this instance as just as questionable as one that provided the other.

NFData provides no guarantees that it "really" does anything. There are lots of such "mildly dangerous instances" that may not be as fully strict as you'd like for NFData. You'll never rule them all out.

For the scenario you fear to come to pass you have to make two lapses in judgment. First you have to depend on a package that provides the instance, and then you have to write code that depends on the instance existing.

That seems well balanced against the fact that someone using criterion might very well want to define the instance for NFData (a -> b) locally in their benchmark suite, regardless of whether or not you believe the instance should exist.

-Edward

On Mon, May 23, 2016 at 7:00 AM, amigalemming [email protected] wrote:

On Mon, 23 May 2016, Edward Kmett wrote:

We have an open world. Orphan instances are deeply unpopular and anti-modular. If someone needs the instance, let them have it, it won't go far enough upstream to matter.

There needs to be only one package in any of hundred packages I import defining this instance as an orphan and then this disables my type safety check. I prefer the non-implementable instance and if someone needs the enumerating instance he can implement it for a newtype wrapper of the function type.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub https://github.com/haskell/deepseq/issues/16#issuecomment-220949948

— You are receiving this because you are on a team that was mentioned.

Reply to this email directly or view it on GitHub https://github.com/haskell/deepseq/issues/16#issuecomment-220967419

snoyberg avatar May 23 '16 12:05 snoyberg

I'm also for no instance at this point. If you think the existing instance is bad, it's better for no instance to exist, so that you get an error if you accidentally try to use it. The other instances I can think of have roughly the same negatives as the existing one. You don't get a compile-time complaint, you just have to figure out what went wrong if your program ever exercises the instance (and either blows up or loops uselessly for a very long time). And the instance that doesn't just blow up requires introducing a new class to implement ideally, and then getting people to implement it.

dolio avatar May 23 '16 13:05 dolio

On Mon, 23 May 2016, dolio wrote:

I'm also for no instance at this point. If you think the existing instance is bad, it's better for no instance to exist, so that you get an error if you accidentally try to use it. The other instances I can think of have roughly the same negatives as the existing one. You don't get a compile-time complaint,

I definitely want a compile-time complaint, and I think this can be better done with an explicitly forbidden instance. I think of something similar to Edward, only in Haskell 98, e.g.

class NotImplementable a where -- not exported instance NotImplementable a => NFData (a -> b) where

This instance tells any programmer that the instance cannot be defined anymore and that this is by intent.

amigalemming avatar May 23 '16 13:05 amigalemming

On Mon, 23 May 2016, Herbert Valerio Riedel wrote:

@amigalemming afaik you suggested to implement some warning facility to annotate undesirable/questionable instances; can you think of a variant which would help here?

I proposed this one: https://ghc.haskell.org/trac/ghc/ticket/11796

It would solve the issue here, too, but it is not implemented in GHC, it did not even start a discussion, so far.

amigalemming avatar May 23 '16 13:05 amigalemming

My point about that instance is that it is something that you are just as free to implement as someone who wants to implement the instance you don't like.

Just to be clear, I was not advocating that the instance I mentioned be placed in base.

I was mentioning that if you wanted to rule out the instance in question, you could write your own orphan instance in your own package somewhere, and it'd infect the global context and conflict with any other attempted instance.

So you can either, simply check to make sure that your code still compiles with any version of your dependencies you know doesn't have the instance around, or you can jump through hoops and create one of the orphan 'impossible' instances described so far.

Only in the latter case do you break users who just want some instance like instance NFData (a -> b) locally in their criterion benchmarks, and they asked for it by depending on your code.

-Edward

On Mon, May 23, 2016 at 9:44 AM, amigalemming [email protected] wrote:

On Mon, 23 May 2016, dolio wrote:

I'm also for no instance at this point. If you think the existing instance is bad, it's better for no instance to exist, so that you get an error if you accidentally try to use it. The other instances I can think of have roughly the same negatives as the existing one. You don't get a compile-time complaint,

I definitely want a compile-time complaint, and I think this can be better done with an explicitly forbidden instance. I think of something similar to Edward, only in Haskell 98, e.g.

class NotImplementable a where -- not exported instance NotImplementable a => NFData (a -> b) where

This instance tells any programmer that the instance cannot be defined anymore and that this is by intent.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub https://github.com/haskell/deepseq/issues/16#issuecomment-220984242

ekmett avatar May 23 '16 14:05 ekmett

On Mon, 23 May 2016, Edward Kmett wrote:

Just to be clear, I was not advocating that the instance I mentioned be placed in base.

I understood it this way.

Only in the latter case do you break users who just want some instance like instance NFData (a -> b) locally in their criterion benchmarks, and they asked for it by depending on your code.

Is this a frequent usecase? How about using a newtype wrapper there? The deepseq package could provide the wrapper even by itself.

amigalemming avatar May 23 '16 14:05 amigalemming

The criterion example was one usecase. One that enumerates all of the inputs and memoizes a function's results is another that has the same head shape that also conflicts. Sure, we could hide it behind a newtype. We could hide every instance for (->) behind a newtype, but we don't.

But if you can build with any version of your dependencies that doesn't have the instance, then you can't be relying on it.

You'll only run into this problem when you explicitly call rnf on a function and when someone you don't like that you depend upon brings an instance into scope. That frankly strikes me as pretty far removed from an active concern, and more a theoretical exercise.

-Edward

On Mon, May 23, 2016 at 10:38 AM, amigalemming [email protected] wrote:

On Mon, 23 May 2016, Edward Kmett wrote:

Just to be clear, I was not advocating that the instance I mentioned be placed in base.

I understood it this way.

Only in the latter case do you break users who just want some instance like instance NFData (a -> b) locally in their criterion benchmarks, and they asked for it by depending on your code.

Is this a frequent usecase? How about using a newtype wrapper there? The deepseq package could provide the wrapper even by itself.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub https://github.com/haskell/deepseq/issues/16#issuecomment-220999093

ekmett avatar May 23 '16 14:05 ekmett

On Mon, 23 May 2016, Edward Kmett wrote:

The criterion example was one usecase. One that enumerates all of the inputs and memoizes a function's results is another that has the same head shape that also conflicts. Sure, we could hide it behind a newtype. We could hide every instance for (->) behind a newtype, but we don't.

The existence of different possible implementations is an argument pro dedicated newtype wrappers for me.

You'll only run into this problem when you explicitly call rnf on a function and when someone you don't like that you depend upon brings an instance into scope. That frankly strikes me as pretty far removed from an active concern, and more a theoretical exercise.

I want to be sure that I do not accidentally call 'rnf' or that a library function calls 'rnf' for me on any structure, where a function is contained somewhere deep in it, which can happen if only one of hundred transitively imported packages defines an orphan NFData (a -> b) instance. Even if I check the absense of such an orphan instance today, it may change easily if one of the transitively imported packages extends its imports.

amigalemming avatar May 23 '16 16:05 amigalemming

They can do that anyway if they just define their instance for any composite structure in the middle by hand. Even your newtype solution is evidence of this. You get no such transitive guarantee.

Sent from my iPhone

On May 23, 2016, at 12:03 PM, amigalemming [email protected] wrote:

On Mon, 23 May 2016, Edward Kmett wrote:

The criterion example was one usecase. One that enumerates all of the inputs and memoizes a function's results is another that has the same head shape that also conflicts. Sure, we could hide it behind a newtype. We could hide every instance for (->) behind a newtype, but we don't.

The existence of different possible implementations is an argument pro dedicated newtype wrappers for me.

You'll only run into this problem when you explicitly call rnf on a function and when someone you don't like that you depend upon brings an instance into scope. That frankly strikes me as pretty far removed from an active concern, and more a theoretical exercise.

I want to be sure that I do not accidentally call 'rnf' or that a library function calls 'rnf' for me on any structure, where a function is contained somewhere deep in it, which can happen if only one of hundred transitively imported packages defines an orphan NFData (a -> b) instance. Even if I check the absense of such an orphan instance today, it may change easily if one of the transitively imported packages extends its imports. — You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub

ekmett avatar May 23 '16 16:05 ekmett

I think the main issue is the removal of the instance. It seems that this instance is not widely useful, and can be implemented as an orphan (or perhaps a newtype) for those who need it. Making it unimplementable seems to be a more controversial stretch, and tacking it on to the issue of the instance's removal makes it more likely that nothing will get done at all.

I think we should just remove the instance, make a changelog entry, and majour version bump.

chessai avatar Jun 24 '19 14:06 chessai

On Mon, 24 Jun 2019, chessai wrote:

I think we should just remove the instance, make a changelog entry, and majour version bump.

Yes, please.

amigalemming avatar Jun 24 '19 14:06 amigalemming

I agree that this instance just needs to be removed.

andrewthad avatar Jun 24 '19 14:06 andrewthad

see #47

chessai avatar Jun 24 '19 15:06 chessai

What it actually should do is use seq to turn it to WHNF, then use unpackClosure# to make sure everything inside it is evaluated as well.

Zemyla avatar Nov 02 '19 15:11 Zemyla

On Sat, 2 Nov 2019, Zemyla wrote:

What it actually should do is use seq to turn it to WHNF, then use unpackClosure# to make sure everything inside it is evaluated as well.

The function can still be partial and then rnf should diverge, too, shouldn't it?

amigalemming avatar Nov 02 '19 15:11 amigalemming

unpackClosure# doesn't actually evaluate the function. It simply takes it apart into the component pieces where any data needed for actually calling the function are stored.

Basically, imagine a subset of Haskell where only named functions exist and can be passed to things. To represent a partially evaluated function of some sort, you'd need an existential datatype, and functions to manipulate it.

data Closure a = forall u. Closure u (u -> a) -- Here, the (u -> a) function has to be able to be named at compile-time.

closeFmap :: (u -> a, a -> b, u) -> b
closeFmap (g, f, u) = g (f u)

instance Functor Closure where
  fmap f (Closure u g) = Closure (g, f, u) closeFmap

closeLiftA2 :: (u -> a, v -> b, a -> b -> c, u, v) -> c
closeLiftA2 (gu, gv, f, u, v) = f (gu u) (gv v)

closeAp :: (u -> a -> b, v -> a, u, v) -> b
closeAp (gu, gv, u, v) = gu u (gv v)

instance Applicative Closure where
  pure a = Closure a id

  liftA2 f (Closure u gu) (Closure v gv) = Closure (gu, gv, f, u, v) closeLiftA2

  Closure u gu <*> Closure v gv = Closure (gu, gv, u, v) closeAp

closeBind :: (u -> a, u, a -> Closure b) -> b
closeBind (gu, u, f) = case f (gu u) of
  Closure v gv -> gv v

instance Monad Closure where
  Closure u gu >>= f = Closure (gu, u, f) closeBind

This may seem silly at first, but it's pretty much what the stackless, tagless G-machine behind GHC does when you write code with closed variables, and it's also what you need to do when you work with some forms of distributed Haskell.

Zemyla avatar Nov 02 '19 15:11 Zemyla

On Sat, 2 Nov 2019, Zemyla wrote:

unpackClosure# doesn't actually evaluate the function. It simply takes it apart into the component pieces where any data needed for actually calling the function are stored.

I think I understand what you are suggesting, but I doubt that it is a useful definition.

amigalemming avatar Nov 02 '19 16:11 amigalemming

I agree with @Zemyla that rnf for functions should force everything inside the function's closure.

I think I understand what you are suggesting, but I doubt that it is a useful definition.

I disagree @amigalemming. This definition would be very much in spirit of the goals of deepseq. The documentation on Hackage presents lazy IO as a use case for rnf:

import System.IO
import Control.DeepSeq
import Control.Exception (evaluate)

readFile' :: FilePath -> IO String
readFile' fn = do
    h <- openFile fn ReadMode
    s <- hGetContents h
    evaluate (rnf s)
    hClose h
    return s

Here, it is safe because rnf s forces the string fully. However, what if we applied some function to it first?

    s <- hGetContents h
    let x = f s
    evaluate (rnf x)
    hClose h

If, say, x :: [Bool], then this is still safe. For example, we can imagine f = map isDigit. However, if we had f = map (==), then we'd end up with x :: [Char -> Bool]. Then to make it safe we should force the closures of these functions.

Thus the definition offered by @Zemyla would make the pattern above safe in the general case.

int-index avatar Jan 22 '20 17:01 int-index

Although I'm not sure how @Zemyla imagines running rnf for the data captured in the function closure. Where would the NFData instances for the captured data come from? This suggests that rnf should be a primitive implemented in the RTS.

int-index avatar Jan 22 '20 17:01 int-index

yeah, as @int-index says, forcing the captured thunks in a closure requires RTS hooks to work, and closures are existentials, and thus the RTS would have to know about how to NFData everything underneath the function itself. Thats a pretty hard ask

cartazio avatar Jan 22 '20 20:01 cartazio

or maybe i'm missunderstanding this thread :)

cartazio avatar Jan 22 '20 20:01 cartazio

and thus the RTS would have to know about how to NFData everything underneath the function itself. Thats a pretty hard ask

Maybe it's not hard – I believe NFData can be implemented in a generic manner, without having instances at all. Just like seq does not require any instances.

int-index avatar Jan 22 '20 20:01 int-index

I'd like to offer the following specification:

  • rnf :: a -> () should be a function such that for any f and cont, the following program prints A before B, if at all:

    import Control.Exception
    import Control.DeepSeq
    import System.IO.Unsafe
    
    main = do
      b <- bracket (unsafeInterleaveIO (putStrLn "A"))
                   (\_ -> putStrLn "B")
                   (\a -> do let b = f a
                             evaluate (rnf b)
                             return b)
      cont b
    

Currently, this holds for some choices of f and cont, such as:

f = id
cont a = a `seq` return ()

However, it fails with other choices of f and cont due to the NFData function instance (the removal of which is proposed in this ticket):

f = const
cont g = g () `seq` return ()

Removing the instance for functions would turn the failing case into a compile-time error. Implementing NFData in the RTS to traverse the function's closure would allow both cases to succeed.

int-index avatar Jan 22 '20 20:01 int-index

Exactly. The Haskell RTS knows what is inside each ADT and function, or else the garbage collector wouldn't work. And actually performing the RNF isn't trivial, but it is mechanical; evaluate thunks, walk into ADTs and closures and lifted immutable pointers (Array#, ArrayArray#, Weak#, etc), and RNF everything inside. Keep everything that's been fully evaluated in a hash table or something so you don't get stuck in a loop, which would be an advantage this would have over naive, user-defined NFData instances. There could also be a global table for CAFs, so they wouldn't be rnfed more than once either. Best of all, most of this machinery is already in place in the RTS, for creating compact regions.

At that point, I'm not sure you'd even need the NFData class. The only use I could think of would be custom instances like Once by Edward Kmett, but even that would be unnecessary if we used pointer tagging for RNF as well as WHNF. ADTs could be tagged automatically in the constructor (if all the arguments are RNF, then the constructor is as well), and closures and lifted immutable pointers could be tagged during garbage collection. At that point, not only would the typeclass be entirely unnecessary, but you could have a !! annotation for data declarations that shows the value in the constructor is in RNF already.

Zemyla avatar Jan 22 '20 20:01 Zemyla

One gotcha is making sure it plays nice with garbage collection. One languishing huge issue implicit with current rts primops is some of them dont yield to the scheduler tool completion.

On Wed, Jan 22, 2020 at 3:56 PM Zemyla [email protected] wrote:

Exactly. The Haskell RTS knows what is inside each ADT and function, or else the garbage collector wouldn't work. And actually performing the RNF isn't trivial, but it is mechanical; evaluate thunks, walk into ADTs and closures and lifted immutable pointers (Array#, ArrayArray#, Weak#, etc), and RNF everything inside. Keep everything that's been fully evaluated in a hash table or something so you don't get stuck in a loop, which would be an advantage this would have over naive, user-defined NFData instances. There could also be a global table for CAFs, so they wouldn't be rnfed more than once either. Best of all, most of this machinery is already in place in the RTS, for creating compact regions.

At that point, I'm not sure you'd even need the NFData class. The only use I could think of would be custom instances like Once by Edward Kmett, but even that would be unnecessary if we used pointer tagging for RNF as well as WHNF. ADTs could be tagged automatically in the constructor (if all the arguments are RNF, then the constructor is as well), and closures and lifted immutable pointers could be tagged during garbage collection. At that point, not only would the typeclass be entirely unnecessary, but you could have a !! annotation for data declarations that shows the value in the constructor is in RNF already.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/deepseq/issues/16?email_source=notifications&email_token=AAABBQWOU4VYZJRNELETYJ3Q7CXIXA5CNFSM4CERLO7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJVCJ6I#issuecomment-577381625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQQV2VGWSAUGBY5AO5TQ7CXIXANCNFSM4CERLO7A .

cartazio avatar Jan 22 '20 21:01 cartazio