PyRecognizer icon indicating copy to clipboard operation
PyRecognizer copied to clipboard

Dataset concatenation

Open ClashLuke opened this issue 4 years ago • 2 comments

The project already generates pickle files of the dataset, to store them in a reduced binary format.
It would be great to be able to concatenate multiple datasets into bigger datasets.

It should be possible to define the dimension in which the concatenation is done. One can either add more images for existing classes (#24) or create more classes (#25) with new images.

ClashLuke avatar Mar 10 '20 11:03 ClashLuke

This is a very interesting feature to add.

I think that, due to the nature of the issue, is necessary to create another API that is delegated to ONLY create the dataset.

So, the below function have to be exposed as a simple API, in order to return the dataset to the caller. https://github.com/alessiosavi/PyRecognizer/blob/fa7a67ade0af731e125e211cb0366e424b8c43cb/utils/util.py#L265

Regarding the training/tune phase, what is the "user experience" that you prefer for the admin that want to use this new functionality?

  • It have to upload a single file where the two dataset are zipped togheter, and than the dataset will be merged during the training/tuning?
  • We have to write a new API that is delegated to merge the given dataset posted and return the new concatenated dataset, that will be posted to the alredy developed train/tune API?

alessiosavi avatar Mar 11 '20 11:03 alessiosavi

@alessiosavi

Thanks for your response.

The "user experience" would be uploading the available image dataset (batch by batch) to the application.

In brief, I have 100 of the company employees to be enrolled to the application so that later I can identify them. but the problem is all 100 users' images are not received from company HR on the first day.

so on the first day, I get (received from a company HR) (1-50) 50 employee images and I uploaded to app and trained them all fine.

But the next day again (HR sending me) remaining (51-100) 50 users images and I uploaded to app to train the same.

like that, I want to train the employee batch by batch.

Is it implemented in the current repo?

please advise

SaddamBInSyed avatar Mar 11 '20 13:03 SaddamBInSyed