Haxl Equivalence classes on requests.

Hi,

this pull request proposes a new function, dataFetchEquiv, that allows to define equivalent requests, i.e. classes of requests where it can be known a priori that they will yield the same result.

For a given equivalence class, at most one request will be performed, and all equivalent requests will yield the same result.

I use this facility to stretch Haxl a bit, and allow to perform write operations that are disguised as reads: my request type will look like

data MyRequest a where
  EnterOrRetrieveData :: MyData -> MyRequest MyData
  RetrieveData :: MyKey -> MyRequest MyData

I will also have a function that maps MyData to MyKey,

key :: MyData -> MyKey

The query that I like to perform is this: given some value x of type MyData, is there already some y in my data source that has the same key as x? If yes, give me that y, if not, enter x into my data source, and give me x as the result of the query. All future queries for a value z with key z == key x should yield the same result (the goal is to find similar pieces of data, and the function key is chosen so as to map similar data to equal keys).

This can be achieved with the function dataFetchEquiv,

getSimilar :: MyData -> (GenHaxl UserEnv) MyData
getSimilar =
  let equiv :: MyRequest a -> MyRequest a -> Bool
      equiv (EnterOrRetrieveData x) (EnterOrRetrieveData y) = key x == key y
      representative :: MyRequest a -> MyRequest a
      representative (EnterOrRetrieveData x) = RetrieveData (key x)
  in dataFetchEquiv equiv representative . EnterOrRetrieveData

The write operation is idempotent: Haxl will perform at most one write for each key. It is not observable whether anything was written at all, or if the data was already there in the first place. This ensures that the caching and re-ordering made by Haxl are safe (as long as it is not important which data is written for a given key, since that may well be affected by the re-ordering).

This might be a pretty narrow use case, but since the changes are non-invasive, I thought I'd offer a pull-request, in case it is useful to somebody else. Any feedback is greatly appreciated.

Jan 27 '16 13:01 kantp

Strange, it gets a Kind mis-match on ghc-7.6.3, but is accepted by ghc-7.8.4 and 7.10.2. I'll have to look into this.

Jan 27 '16 14:01 kantp

Ok, with 46d9dfd, it works with older ghc versions as well.

Also, for added context (because the example I give in the test is not particularly useful), I described how I use this new functionality to solve a real problem in a lightning talk at the last Haskell eXchange: https://skillsmatter.com/skillscasts/6538-data-deduplication-in-haskell-an-experience-report.

Feb 04 '16 08:02 kantp

I started at this for a while and I don't think I understand all the ramifications. You can certainly do very strange things with dataFetchEquiv. What are the properties under which dataFetchEquiv is safe to use?

I'm also concerned that there's an /O(n)/ search in there when the request is not already cached.

This needs more thought IMO. We've come across cases that are similar but I'm not sure if they're catered for by dataFetchEquiv. For example, a query that counts the number of Xs is satisfied by a previous query that fetches all the Xs.

Feb 05 '16 11:02 simonmar

Thanks for your feedback, Simon. I think the documentation needs some improvement.

In order to safely use dataFetchEquiv, the following conditions have to hold for the two functions equiv and representative that you supply to it:

equiv has to be an equivalence relation (i.e., a equiv b and b equiv c imply a equiv c, a equiv b is the same as b equiv a, and a equiv a always holds)
representative a == representative b iff a equiv b
Two requests for which equiv is true always give the same results.

If you can define these two functions on your requests, you can avoid some redundant data fetching, since for any two requests a, b with a equiv b, only one will be performed, and the result used for both.

Regarding the O(n) search, this might be problematic if there are many BlockedFetches in the queue. However, I don't see a way to avoid this, without changing the internals of the RequestStore.

The case you described is interesting, but I don't think it is covered by this as is. Maybe it could be solved in a somewhat similar fashion, though. You are not trying to identify requests that give the same answer, but for two requests where the answer to one can be inferred from the answer to the other.

I think it could be possible to write down a function that is similar to dataFetchEquiv, but instead of the function equiv, you would have to provide one function that can identify a request that provides enough information to get the answer to the request you try to perform, and another function that uses the result from the other request to get the answer to this one.

In your example, before entering a request Count xs, it would check if Get xs was already there (in which case it could use the length of the result of Get xs). What do you think?

Feb 05 '16 13:02 kantp

@simonmar I just made a draft of how I think the use case you described can be handled (see #41). It could probably benefit from some refactoring, as it introduces a little code redundancy, but it demonstrates the general idea.

Feb 05 '16 19:02 kantp

Haxl Haxl copied to clipboard

Equivalence classes on requests.

Haxl
Haxl copied to clipboard