Andrei Zhabinski

Results 180 comments of Andrei Zhabinski

Basically, code for reading a particular datasets is unlikely to change often, so we can write testing code, run it locally and comment out before merging. This way it will...

Using environment variables sounds like a perfect solution! Regarding subdatasets and custom CI, I guess it's much more involved and may not work for all the users. Or maybe I...

Sorry for infrequent replies - I've got overwhelmed with other projects I'm inn charge of. I agree that `UInt8` is rarely useful in practice, but it's unclear what it use...

Does it mean a new dataset will need to implement `traintensor` in addition to `traindata`? > For Food-101 i don't think it matters because the data doesn't seem to be...

> What do you think about the idea that after download we repack the data into a HDF5 file? HDF5 supports compression as well as reading individual "datasets" (in our...

DataFrmes.jl is definitely the way to go, but the integration isn't done yet. In the simplest case, you can convert rows of `DataFrames.DataFrame` to `Spark.Row`s and use `Spark.createDataFrame(...)` to convert...

Can you point to the page with this example? `SparkContext` has been removed, and I can't find any mentions of it in the docs.

For whatever reason JuliaHub doesn't want to update the README of the project and still points to the old documentation. I tried to fix it by re-triggering the TagBot, but...

> Given the changes that I can see now in the docs, it looks like the SparkContext was taken out of the project. Is that correct? Yes.

Yes! From my observations interprocess communication is the main performance killer for RDD API, so switching to Arrow should be the most important improvement in a while. Although, I did't...