Update get_labelled_samples and get_unlabelled_samples to feed data in batches
Copied from https://github.com/dmonllao/moodleinspire-python-backend/issues/1 before this gets lost:
Current implementation would swallow all system memory if a massive dataset (many GBs) is used, data should be read in batches (https://www.tensorflow.org/programmers_guide/reading_data) this is not likely to happen soon as datasets generated by moodle will hardly reach 10MB but it is still something we should fix.
The only problem I can think of is models evaluation, because we need to shuffle the dataset to evaluate the moodle model using different combinations of training and test data. We can use a subset (limited to X MBs) of the evaluation dataset instead of shuffling all big dataset.
I started working on this on my last project week (https://github.com/dmonllao/moodleinspire-python-backend/tree/batch-evaluation) and I couldn't finish it; I will try to look at it at some point in future.
Catalyst more or less fixed this with 3a811cfd4bb70c362aea732014c7715d8c2ee467 but it looks like we never made a pull request.