DeepLearningLifeSciences
DeepLearningLifeSciences copied to clipboard
Chapter 8 - data.py
Hi there, I was trying to run the code but it does not run, in line 41 you are looking for
image_names = [p for p in os.listdir(images_path)if p.startswith('cut_') and p.endswith('.png')]
But there are no png images in the rep that I downloaded from Kaggle, all the images are in jpeg format. and in the list you build in :
for im in image_names: if im.endswith('.jpeg') and not im.startswith('cut_') and not 'cut_' + im in image_names: raw_images.append(im)
Does not get used at all the raw_images ....
I am trying to understand why you are looking for 'cut_' there is no image that starts or ends with 'cuts_'
Can you please help me get a working version.
Thanks. Oscar.
Hi Oscar,
Thanks for raising the issue! So in the pipeline we first read all raw images (those without cut_
prefix and in the format of jpeg) and do a preprocessing step of cutting. This is done by calling cut_raw_images
function (line 35) and it will generate cut_*.png
, which are read in the next step. Let me know if you have any problems with this step.
Best,
Michael
Hi Miaecle,
Thanks for your quick response! this is the problem I am having.....
image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ] -- it returns Zero --
-- image_name -- before
-- image_name -- after image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ]
and you can see in the lower right pane that 'cut_raw_images' is reading all the images
Hope you can help me.
Thanks, -Oscar-
Hi There,
I was looking into it more, and this is what I found, is there anything I am missing? look at the screenshot
and the code and the left-top pane to see the variables names.
I hope you can help me run this code, I am very interested in the DeepChem library.
Thanks, -Oscar
@TheStoneMX Oh I think I find where might go wrong. I don't have the codes with me right now but I will try to run the codes later today. Can you check if there is a new folder called cut
under your path for raw images? All the preprocessed images (cut_*.png
) might be stored there. If you move these into their root folder (the same folder for raw images) it might run.
@miaecle , Thanks for the response, there is a cut directory, but there is nothing there because the code never gets executed....
if os.path.join(path, 'cut_' + os.path.splitext(img_path)[0] + '.png'):
continue
**#### THIS CODE BELOW NEVER GETS EXCUTED #####**
img = cv2.imread(os.path.join(path, img_path))
edges = cv2.Canny(img, 10, 30)
coords = zip(*np.where(edges > 0))
n_p = len(coords)
coords.sort(key=lambda x: (x[0], x[1]))
center_0 = int((coords[int(0.01 * n_p)][0] + coords[int(0.99 * n_p)][0]) / 2)
coords.sort(key=lambda x: (x[1], x[0]))
center_1 = int((coords[int(0.01 * n_p)][1] + coords[int(0.99 * n_p)][1]) / 2)
edge_size = min( [center_0, img.shape[0] - center_0, center_1, img.shape[1] - center_1])
img_cut = img[(center_0 - edge_size):(center_0 + edge_size), (center_1 - edge_size):(center_1 + edge_size)]
img_cut = cv2.resize(img_cut, (512, 512))
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)
Thanks, Hope to get the updated code soon, so I can run the sample chapter.
-Oscar.
@miaecle ,
Why do we need to make every image a png and not leave it as a jpeg ?
Thanks, -Oscar.
@TheStoneMX So it is not a jpeg/png issue, basically we need to cut the image so that we can fit it into the network. Please see PR #8 for the quick fix, right now the data loading part should be clean. Let me know if you find any further issues.
@miaecle thanks a lot for the fix! it is working now, but there is one more thing that needs fixing.... sorry.
I found that the code wasn't writing any images to disk.... and found the cv2.imwrite does not raise an exception when it can't find the path.
try:
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)
except:
logger.critical("error - cv2.imwrite")
continue
so looking at the code I found that it creates a directory cut, one level abobe train, but it tries to write to /train/cut/ cut being inside the train directory. So I created the directory manually and everything is working meaning writing png images to cut directory.
So it is not a jpeg/png issue
I never said it was a jpeg/ png issue....
The questions are why do you need to feed the network png's and not just jpeg's ..... because the way it is being done, it takes about 4 days to write 35 thousand images to disk, it has been 12 hours since I started and I have only written 2764 images to disk.... and I have x299 board with X9960 processor with SSD disk....
I can't imagine how low it will take someone with the less fast computer and a regular hard drive. unless I am missing something here.
Thanks for your great support! I will make sure I mention it on amazon review of the book.
@TheStoneMX I see what you mean, thanks for the feedback! I will try optimize the pipeline to accelerate the preprocessing step.