swift-models icon indicating copy to clipboard operation
swift-models copied to clipboard

Add support for addition of UCI datasets

Open boronhub opened this issue 5 years ago • 8 comments

Most UCI datasets are comma seperated classification datasets, much like the IRIS dataset. The first tutorial in the S4TF docs is one where we train a model for classification based on the IRIS datset. It uses this file to create a dataset by defining datatypes and reading contents of the csv file. I think a good addition to this repository might be a similar file, possibly CSVUtilities.swift so that addition of csv datasets for use with S4TF can be made easier. If the maintainers feel this is a worthwhile pursuit, I am willing to create a PR for this.

boronhub avatar Jan 17 '20 20:01 boronhub

It would be interesting to have a DataLoader struct with a fromCSV(withURL url: URL) initializer.

I'm not sure if we can support all UCI datasets at once since they are not all in the same format, but it would definitely be helpful if we could add support for some of them.

rickwierenga avatar Jan 18 '20 09:01 rickwierenga

Definitely not all, but certainly the csv ones. The other option is a full fledged Dataset API like TFDS, but that would require much more work.

boronhub avatar Jan 18 '20 10:01 boronhub

There are raw ops available for csv loading

Shashi456 avatar Jan 18 '20 10:01 Shashi456

The other option is a full fledged Dataset API like TFDS, but that would require much more work.

I am wondering, could we perhaps use the Python interop to load these datasets from TFDS and convert them to swift format?

WilliamHYZhang avatar Jan 19 '20 14:01 WilliamHYZhang

I think it could be used, yes (haven't tested it). But it would probably be quite slow considering you'd have to use Python every batch. It could be a temporary fix, good idea.

I'm curious to see how this works out. @WilliamHYZhang, maybe we could work on something together in the future?

rickwierenga avatar Jan 19 '20 21:01 rickwierenga

@rickwierenga I'm down for it! Looking forward to working together.

WilliamHYZhang avatar Jan 19 '20 22:01 WilliamHYZhang

@rickwierenga and I have been working on some cool stuff in our new organization "S5TF Team". We have gotten TFDS Python interop working in our examples repo here. More importantly, we've also been working on a new dataset API to address some of the aforementioned issues, feel free to check it out here.

WilliamHYZhang avatar Jan 24 '20 19:01 WilliamHYZhang

As a quick update, one example of this was just incorporated in PR #311 by Jacopo Mangiavacchi.

BradLarson avatar Feb 19 '20 16:02 BradLarson