DeepLearningLifeSciences icon indicating copy to clipboard operation
DeepLearningLifeSciences copied to clipboard

Chapter 8 - data.py

Open TheStoneMX opened this issue 5 years ago • 10 comments

Hi there, I was trying to run the code but it does not run, in line 41 you are looking for

image_names = [p for p in os.listdir(images_path)if p.startswith('cut_') and p.endswith('.png')]

But there are no png images in the rep that I downloaded from Kaggle, all the images are in jpeg format. and in the list you build in :

for im in image_names: if im.endswith('.jpeg') and not im.startswith('cut_') and not 'cut_' + im in image_names: raw_images.append(im)

Does not get used at all the raw_images ....

I am trying to understand why you are looking for 'cut_' there is no image that starts or ends with 'cuts_'

Can you please help me get a working version.

Thanks. Oscar.

TheStoneMX avatar Jul 14 '19 01:07 TheStoneMX

Hi Oscar, Thanks for raising the issue! So in the pipeline we first read all raw images (those without cut_ prefix and in the format of jpeg) and do a preprocessing step of cutting. This is done by calling cut_raw_images function (line 35) and it will generate cut_*.png, which are read in the next step. Let me know if you have any problems with this step. Best, Michael

miaecle avatar Jul 14 '19 22:07 miaecle

Hi Miaecle,

Thanks for your quick response! this is the problem I am having.....

image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ] -- it returns Zero --

-- image_name -- before image_names_before

-- image_name -- after image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ]

image_names_after

and you can see in the lower right pane that 'cut_raw_images' is reading all the images

Hope you can help me.

Thanks, -Oscar-

TheStoneMX avatar Jul 15 '19 03:07 TheStoneMX

Hi There,

I was looking into it more, and this is what I found, is there anything I am missing? look at the screenshot code_never_executed and the code and the left-top pane to see the variables names.

I hope you can help me run this code, I am very interested in the DeepChem library.

Thanks, -Oscar

TheStoneMX avatar Jul 15 '19 16:07 TheStoneMX

@TheStoneMX Oh I think I find where might go wrong. I don't have the codes with me right now but I will try to run the codes later today. Can you check if there is a new folder called cut under your path for raw images? All the preprocessed images (cut_*.png) might be stored there. If you move these into their root folder (the same folder for raw images) it might run.

miaecle avatar Jul 15 '19 17:07 miaecle

@miaecle , Thanks for the response, there is a cut directory, but there is nothing there because the code never gets executed....

if os.path.join(path, 'cut_' + os.path.splitext(img_path)[0] + '.png'):
  continue

**#### THIS CODE BELOW NEVER GETS EXCUTED #####**
img = cv2.imread(os.path.join(path, img_path))
edges = cv2.Canny(img, 10, 30)
coords = zip(*np.where(edges > 0))
n_p = len(coords)

coords.sort(key=lambda x: (x[0], x[1]))
center_0 = int((coords[int(0.01 * n_p)][0] + coords[int(0.99 * n_p)][0]) / 2)
coords.sort(key=lambda x: (x[1], x[0]))
center_1 = int((coords[int(0.01 * n_p)][1] + coords[int(0.99 * n_p)][1]) / 2)

edge_size = min( [center_0, img.shape[0] - center_0, center_1, img.shape[1] - center_1])
img_cut = img[(center_0 - edge_size):(center_0 + edge_size), (center_1 - edge_size):(center_1 + edge_size)]
img_cut = cv2.resize(img_cut, (512, 512))
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)

Thanks, Hope to get the updated code soon, so I can run the sample chapter.

-Oscar.

TheStoneMX avatar Jul 15 '19 17:07 TheStoneMX

@miaecle ,

Why do we need to make every image a png and not leave it as a jpeg ?

Thanks, -Oscar.

TheStoneMX avatar Jul 15 '19 17:07 TheStoneMX

@TheStoneMX So it is not a jpeg/png issue, basically we need to cut the image so that we can fit it into the network. Please see PR #8 for the quick fix, right now the data loading part should be clean. Let me know if you find any further issues.

miaecle avatar Jul 15 '19 21:07 miaecle

@miaecle thanks a lot for the fix! it is working now, but there is one more thing that needs fixing.... sorry.

I found that the code wasn't writing any images to disk.... and found the cv2.imwrite does not raise an exception when it can't find the path.

try:
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)
except:
  logger.critical("error - cv2.imwrite")
continue 

so looking at the code I found that it creates a directory cut, one level abobe train, but it tries to write to /train/cut/ cut being inside the train directory. So I created the directory manually and everything is working meaning writing png images to cut directory.

TheStoneMX avatar Jul 16 '19 14:07 TheStoneMX

So it is not a jpeg/png issue

I never said it was a jpeg/ png issue....

The questions are why do you need to feed the network png's and not just jpeg's ..... because the way it is being done, it takes about 4 days to write 35 thousand images to disk, it has been 12 hours since I started and I have only written 2764 images to disk.... and I have x299 board with X9960 processor with SSD disk....

I can't imagine how low it will take someone with the less fast computer and a regular hard drive. unless I am missing something here.

Thanks for your great support! I will make sure I mention it on amazon review of the book.

TheStoneMX avatar Jul 16 '19 14:07 TheStoneMX

@TheStoneMX I see what you mean, thanks for the feedback! I will try optimize the pipeline to accelerate the preprocessing step.

miaecle avatar Jul 16 '19 18:07 miaecle