cmapPy icon indicating copy to clipboard operation
cmapPy copied to clipboard

GCTX parsing is not thread safe.

Open nborunov-integral opened this issue 3 years ago • 0 comments

Here is the code I'm using: `import cmapPy.pandasGEXpress.parse_gctx as parse_gctx import time from threading import Thread

res = []
threads = []

def read(idx):
    print(f'Start reading {idx}')
    t = time.time()
    res.append(parse_gctx.parse('GSE92742_Broad_LINCS_Level5_COMPZ.MODZ_n473647x12328.gctx', ridx=[idx]))
    t = (time.time() - t)
    print(f'Done reading {idx} in {t} seconds')

threads.append(Thread(target=read, args=(6000,)))
threads.append(Thread(target=read, args=(12000,)))
threads.append(Thread(target=read, args=(5000,)))
threads.append(Thread(target=read, args=(300,)))
threads.append(Thread(target=read, args=(40,)))
threads.append(Thread(target=read, args=(800,)))

all_t = time.time()

for t in threads:
    t.start()

for t in threads:
    t.join()

all_t = time.time() - all_t

print(f'The End in {all_t} seconds')

all_t = time.time()
res = []
for idx in [234, 4351, 6233, 9087, 987, 97]:
    read(idx)

all_t = time.time() - all_t

print(f'The End in {all_t} seconds')`

And here is the output: Start reading 12000 Start reading 6000 Start reading 5000 Start reading 800 Start reading 40 Start reading 300 Done reading 12000 in 337.7198541164398 seconds Done reading 6000 in 337.7183690071106 seconds Done reading 800 in 338.19431233406067 seconds Done reading 300 in 338.36488699913025 seconds Done reading 5000 in 339.04932618141174 seconds Done reading 40 in 339.0456030368805 seconds The End in 339.0754089355469 seconds Start reading 234 Done reading 234 in 55.63448905944824 seconds Start reading 4351 Done reading 4351 in 55.87116312980652 seconds Start reading 6233 Done reading 6233 in 55.85987401008606 seconds Start reading 9087 Done reading 9087 in 55.898045778274536 seconds Start reading 987 Done reading 987 in 56.020151138305664 seconds Start reading 97 Done reading 97 in 56.393441915512085 seconds The End in 335.67835783958435 seconds

As you can it takes about 55 sec to read one record, when I read the records sequentially. When I try to create parallel threads it take the same time as when I read the files sequentially instead of about 55 seconds for all in the threads.

nborunov-integral avatar Jan 19 '21 21:01 nborunov-integral