plantcv icon indicating copy to clipboard operation
plantcv copied to clipboard

Create test datasets

Open nfahlgren opened this issue 7 years ago • 5 comments

Description

We have large datasets available here but it would be useful to have smaller datasets that users could easily download and use for testing, learning, etc. This idea is based on issue #159.

Details

Smaller datasets are easier to download quickly and onto a personal computer and are easier to visualize. We could have a variety of small datasets.

  1. It could be worth having a sample dataset associated with each public dataset, available through the same mechanisms the full datasets are available through.
  2. In addition to or alternatively, some sample datasets for each data type or analysis type.
  3. A dataset for images used in the documentation. This could be best stored in a separate GitHub repository so that we could easily add to it over time, as long as it stays under 1GB total size.

Completion Criteria

  • [ ] Create dataset(s)
  • [ ] Make dataset(s) available
  • [ ] Update the documentation with instructions on how to get the dataset(s)

nfahlgren avatar Jun 20 '17 20:06 nfahlgren

Hi Noah, I need to train a deep neural network with images of vegetables (those normally suitable for green house farming). Can you please help with this data or provide me with a suitable link. Thanks.

abiodungit avatar Sep 29 '17 15:09 abiodungit

@abiodungit I'm not aware of any datasets for vegetables per se, or even many datasets with labeled training data (at the moment). I can point you to our publicly available datasets: http://plantcv.danforthcenter.org/pages/data.html and the additional datasets that are described here: http://www.plant-image-analysis.org/dataset.

nfahlgren avatar Sep 29 '17 17:09 nfahlgren

Dear Noah, Thank you for those links. They are much useful. @abiodungit

Onile A. E.

On Friday, September 29, 2017 6:23 PM, Noah Fahlgren <[email protected]> wrote:

@abiodungit I'm not aware of any datasets for vegetables per se, or even many datasets with labeled training data (at the moment). I can point you to our publicly available datasets: http://plantcv.danforthcenter.org/pages/data.html and the additional datasets that are described here: http://www.plant-image-analysis.org/dataset.— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

abiodungit avatar Sep 29 '17 19:09 abiodungit

Could we use "original" images from tutorials and other examples in the documentation as one of the small datasets? Maybe another dataset could be a subset of images from currently available datasets.

HaleySchuhl avatar Jan 09 '19 15:01 HaleySchuhl

Yeah, I think it would be ideal if the documentation images were available for people to work on since we demonstrate functions with that data. Since the static documentation on Read the Docs is reduced quality/downsized, maybe it's easier to worry about the test datasets for now in the context of the interactive documentation. We could potentially replace our existing documentation images with this new dataset later if we think it's important for the data to match between the static and interactive docs.

nfahlgren avatar Jan 09 '19 15:01 nfahlgren