PINK icon indicating copy to clipboard operation
PINK copied to clipboard

Very large data dimension size when initialising training

Open SpaceMeerkat opened this issue 2 years ago • 8 comments

Hi,

I'm trying to train an SOM (just to get used to the way PINK works, hence the tiny file numbers and parameters below) but when I set the training run going, my output log file is growing in size at a rate of ~1Gb every 30 seconds, being filled with imformation like that shown in the quote below.

Number of data entries = 1096810496 Data dimension = 1097859072 x 1100480512 x 1102053376 x 1101004800 x 1099956224 x 1099431936 x 1101004800 x 1097859072 x 1098907648 x 1101004800 x 1101004800 x 1103101952 x 1104674816 x 1103101952 x 1105199104 x 1102577664 x 1101529088 x 1101004800 x 1098907648 x 1100480512 x 1101529088 x 1102053376 x 1094713344 x 1093664768 x 1103626240 x 1099956224 x 1088421888 x 1101004800 x 1099431936 x 1099956224 x 1104150528 x 1095761920 x 1102577664 x 1101529088 x 1101004800 x 1099956224 x 1099431936 x 1101529088 x 1103626240 x 1101004800 x 1101004800 x 1102577664 x 1097859072 x 1098907648 x 1101529088 x 1101004800 x 1097859072 x 1102053376 x 1099956224 x 1096810496 x 1101004800 x 1098907648 x 1097859072 x 1100480512 x 1100480512 x 1098907648 x 1101529088 x 1096810496 x 1097859072 x 1095761920 x 1099956224 x 1091567616 x 1098907648 x 1101529088 x 1099431936 x 1092616192 x 1096810496 x 1101529088 x 1099431936 x 1096810496 x 1100480512 x 1106247680 x 1101529088 x 1103101952 x 1101529088 x 1102053376 x 1102053376 x 1097859072 x 1101004800 x 1095761920 x 1096810496 x 1104674816 x 1097859072 x 1099956224 x 1102053376 x 1097859072 x 1099956224 x 1104674816 x 1099956224 x 1109917696 x 1094713344 x 1097859072 x 1111490560 x 1100480512 x 1102577664 x 1113325568 x 1099431936 x 1103626240 x 1113325568 x 1101529088 x 1103626240 x 1113849856 x 1105199104 x 1100480512 x 1111752704 x 1098907648 x 1097859072 x 1108606976 x 1096810496 x 1096810496 x 1104674816 x 1102053376 x 1097859072 x 1106771968 x 1101529088 x 1099956224 x 1102577664 x 1101004800 x 1096810496 x 1100480512 x 1097859072 x 1096810496 x 1103626240 x 1096810496 x 1100480512 x 1092616192 x 1099431936 x 1102577664 x 1091567616 x 1097859072 x 1098907648 x 1097859072 x 1098907648 x 1098907648 x 1094713344 x 1096810496 x 1099956224 x 1099956224 x 1098907648 x 1099431936 x 1105723392 x 1101529088 x 1099431936 x 1108344832 x 1101529088 x 1100480512 x 1105723392 x 1098907648 x 1098907648 x 1104150528 x 1103101952 x 1103101952 x 1107296256 x 1099431936 x 1098907648 x 1104150528 x 1098907648 x 1100480512 x 1104150528 x 1103626240 x 1099956224 x 1103626240 x 1097859072 x 1102053376 x 1099956224 x 1100480512 x

I set the training run going using the following:

$Pink --train _scripts/test.bin _pink_out/som.bin --som-width 10 --som-height 10 --num-iter 1 --numrot 4

So as you can see I'm keeping this little training run simple with a small gridsize as well as only 4 rotations per image.

This is for a training run where I'm just using 3 images of size 128x128, in float 32 format. So I'm confused as to why the number of data entries is so high in the first line of the quote above?

I'd really appreciate any help you can give on this!

Extra info

My images are stored in test.bin using the following numpy raw binary example:

filename = "test.bin"
fileobj = open(filename, mode='wb')
stacked_images.tofile(fileobj)
fileobj.close()

... where stacked_images has shape (128, 128, 3) or (3, 128, 128). (I've tried both ways to see if that was the problem but it still causes the issue above)

SpaceMeerkat avatar Sep 13 '22 08:09 SpaceMeerkat