dress-pattern-recognition-using-CNN
dress-pattern-recognition-using-CNN copied to clipboard
Issues
Around 15k images are present in the data csv, but only about 10k images in total are used in the notebook. The model was trained as a binary problem, but the real problem is a multi-calss one. The only folder created in create dataset is dataset category, but how is dataset category test used in notebooks? Receiving an accuracy of over 95% but not using other metrics to demonstrate it statistically is not a good thing.
Added multithreading for downloading the images much faster
import numpy as np import pandas as pd import requests import os import threading
dress_patterns_df = pd.read_csv('dress_patterns.csv') dress_patterns = dress_patterns_df.values
category
category = set(dress_patterns_df['category']) print(category)
#create a folder dataset and nested folder of category print(os.listdir()) os.mkdir('dataset_category')
for cat in category: print(cat) os.mkdir('dataset_category/'+cat)
print(os.listdir('dataset_category'))
def download_image(url, category, unit_id, i): try: r = requests.get(url, allow_redirects=True) open('dataset_category/'+category+'/'+str(unit_id)+'.jpg', 'wb').write(r.content) except: print('ERROR at: ', i)
save image in respective category folder.
threads = [] for i in range(len(dress_patterns)): if i%5 == 0: print(i, '/', len(dress_patterns)) pattern = dress_patterns[i] url = pattern[3] unit_id = pattern[0] category = pattern[1] thread = threading.Thread(target=download_image, args=(url, category, unit_id, i)) threads.append(thread) thread.start()
# limit the number of threads to 5
if len(threads) == 5:
for thread in threads:
thread.join()
threads = []
wait for any remaining threads to complete
for thread in threads: thread.join()