Spo2_evaluation icon indicating copy to clipboard operation
Spo2_evaluation copied to clipboard

Dataset structure

Open MalcolmMielle opened this issue 4 years ago • 3 comments

Hi all,

Since we should soon get the dataset up and running I'd like to talk about how we plan to provide it to users especially since students are going to be working on it.

building the dataset

Im talking with Dave Hagman about separating the data we will get from the MD from a dataset of community sample. That way we have the original dataset from the doctor which would be a medical dataset, and then we can distribute the app collection to other users and build a larger (but less accurate) dataset. I think the method we will provide to MD has to score high on the medical dataset but a community dataset could be used for training.

Thoughts?

providing the dataset

What do you guys think about making only one half of the dataset public? The non-public part of the datast could be used as testing sub-dataset. This way the user would have only access to the training/validation set but not the final dataset. It's only an idea I wanted to pitch but we could work with the back end people to create an architecture so that students wiłl only be able to upload the result (or method) and would never be able to see the test dataset (I know some dataset have been set up this way by some uni).

It's definitely low priority but I thought it would be interesting to raise this point.

MalcolmMielle avatar Apr 06 '20 12:04 MalcolmMielle