Make ImageCore and MAT optional, switch from GZip to CodecZlib
This PR drops Julia v0.6 and fixes most of Julia v0.7/v1.0 deprecations. Plus it does a few other things:
- switches to CodecZlib.jl from the aging GZip.jl. Unfortunately, it required much more refactoring of the reading code than I originally anticipated (since CodecZlib doesn't support
seek()). But hopefully these changes simplify the codebase a bit too. - makes ImageCore.jl and MAT.jl optional (via Requires.jl). These packages bring with them tons of other dependencies, while the functionality they provide is not essential. ImageCore and MAT are still required for the unit testing. If you install ImageCore and/or MAT in your julia environment, the corresponding MLDatasets functions (converting to images and SVHN2 dataset) would be automatically enabled.
- updates UD_English links
So it looks like the tests are running fine on v0.7 (for stripped-down CI tests that only download MNIST, and also locally when I test for all datasets). On v1.0 it fails because of the deprecations in the dependencies.
The Coveralls coverage is being collected (I've compared it to Literate.jl script and essentially it's the same), but then there is a recurring error when trying to upload it. Here's an excerpt from Travis logs:
Submitting data to Coveralls...
ERROR: HTTP.ExceptionRequest.StatusError(422, HTTP.Messages.Response:
"""
HTTP/1.1 422 Unprocessable Entity
Date: Mon, 13 Aug 2018 10:04:40 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 72
Connection: keep-alive
Set-Cookie: __cfduid=d37098939494e5f2a06d157d3c49c634b1534154680; expires=Tue, 13-Aug-19 10:04:40 GMT; path=/; domain=.coveralls.io; HttpOnly
Cache-Control: no-cache
Status: 422 Unprocessable Entity
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Powered-By: Phusion Passenger Enterprise 5.1.10
X-Request-Id: 06622f00-1946-4c49-aadf-4ba7256abe0b
X-Runtime: 0.240836
X-XSS-Protection: 1; mode=block
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
CF-RAY: 449a5bdb8fbe240e-IAD
{"message":"Couldn't find a repository matching this job.","error":true}""")
Stacktrace:
[1] #request#1 at /home/travis/.julia/packages/HTTP/nUK4f/src/ExceptionRequest.jl:22 [inlined]
[2] (::getfield(HTTP, Symbol("#kw##request")))(::NamedTuple{(:iofunction, :verbose, :cookiejar,
...
I think there are too many big changes here mixed with opinionated changes such as removing testsets.
I am not confortable merging a rewrite like this. For example its not clear to me if the current public interface is preserved. The skip implementention looks like it would have problems with dealing with indices that are out of order (like [3,2,5]).
I am not completely rejecting this change, but I think that before we make such a drastic change, I rather first just update to 0.7 in some minimal way and see if a rewrite is needed
I think there are too many big changes here mixed with opinionated changes such as removing testsets.
What tests are you referring to? I think I've removed none. I can restore the indentation in the test, so that it's more clear (sorry for that, I thought GitHub would be able to detect that it's just a whitespace change), but often the reason for changing the indentation was that I replaced for ... @testset "..." begin ... end end pattern with @testset "..." for ... end, which should be better integrated into Test.jl infrastructure and which is 1 indentation less.
The skip implementention looks like it would have problems with dealing with indices that are out of order (like [3,2,5]).
It does support out-of-order indices (also see the comments for _readdata() methods) and it's covered by the existing tests.
For example its not clear to me if the current public interface is preserved.
As long as it's covered by the tests, it should be preserved.
I can fix some specific things, but unfortunately I have no time to make iterative rewrites. If you think you don't need these updates, feel free to ignore the PR. I needed these changes to switch TSne.jl from unsupported MNIST.jl to MLDatasets.jl, I guess I will have to wait then.
fair. maybe I am just a little overwhelmed by the sheer amount of changes. As a maintainer there is a certain cost to swapping out the familiar code base with a different one. This step can be a bit intimitating. And if I am misjudging thing such as support for out of order indicies it just demonstrates that I would need to spend some time familiarizing myself with it before I'd be able to maintain it
Concerning 0.7 update. I am creating a PR in a minute. So at least the package will support 0.7. For 1.0 support there still need to be some upstream fixes to MAT.jl
@Evizero Any further comments? I would like to release 0.7-enabled version of TSne.jl that could pass CI tests and I hoped to use MLDatasets.jl as the test data provider.
The general question is whether you would like to introduce the changes that necessitated this amount of code churn. I.e. making MAT and ImageCore optional and switching from Gzip to CodecZlib. If you are fine with that, we can think of the strategies to make you comfortable with the changes. The simplest one would be just to clarify your questions about the new code. I can also try splitting the changes into smaller commits, so it's easier to track what's going on.
As impressed and thankful I am for the amount of work you put into this, I don't think I am ready at this point to swap out GZip for a different backend. Its just too many code changes.
A 0.7 compatible version is tagged that should work on linux at least. OSX has the whole HDF5 issue through MAT. So I am very much in favour of making the whole MAT situation nicer. If you would be interessted into making a separate PR for just the optional dependency stuff, I'd be happy to provide quick feedback to get that part merged and tagged quickly.
If you would be interessted into making a separate PR for just the optional dependency stuff, I'd be happy to provide quick feedback to get that part merged and tagged quickly.
Fair enough, these changes could be extracted into a separate PR. See #23
I don't think I am ready at this point to swap out GZip for a different backend. Its just too many code changes.
Hopefully you will be fine with #23, and after rebasing this PR on top of it there would be less changes. :)
One may argue whether they are really required given that now GZip is updated to support 1.0, but they also reduce the amount IO and CPU, since the archive files are read sequentially. This is a community-maintained package, so I will be glad to help you maintaining the relevant parts.
@alyst do you think it is still worth to move to CodecZlib?
Hey @alyst, it looks like this might have finally caught up to us. Any interest in putting a PR together to help resolve https://github.com/JuliaML/MLDatasets.jl/issues/86 ?
closing as outdated