Mathieu Boespflug

Results 161 comments of Mathieu Boespflug

I don't know, but an way to test is to download local copies of the dataset and test using that.

Ok, confirmed. From S3, the run takes 7:15 minutes on my laptop too. But if I download files locally, then it just takes exactly 1 minute. In the nyt dataset,...

Looks like a known issue: http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219. Haven't yet found an upstream ticket to track a resolution though.

The above mentioned link has sample code for fetching data from S3 using the AmazonS3 lib directly rather than Spark, as a workaround: https://gist.githubusercontent.com/pjrt/f1cad93b154ac8958e65/raw/7b0b764408f145f51477dc05ef1a99e8448bce6d/S3Puller.scala. Feel free to submit a PR...

See the announcement blog post. Clodl uses zip files because that is the archive format used for JARs. The tools you propose are for different formats.

I don't think clashing file extensions are a problem per say. Case in point: there are currently four separate entries registered in [Linguist's](https://github.com/github/linguist) [`language.yaml`](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml) for the `.ncl` extension. Nickel would...

Worth a ticket on the GHC issue tracker. Which could become a GHC proposal eventually. Is what your are looking for a slight generalization of `-XQuantifiedConstraints`?

It sounds like that proposal is fairly straightforward to implement. If convincing ourselves of soundness is matter of just a day or two, then I guess there's not much on...

Sounds like constraints nearly all datatypes would want to have. Could we make them implicit?

So can we foresee a solution to this ticket? Or will it need new research?