xqtl-protocol
xqtl-protocol copied to clipboard
Mem optimization of sumstat standardization
This ticket is dedicated to problem 8 in #412. To records potential optimization options
- reducing reuse of unneeded data. At the moment, full rows of the query table will be called into the compare_snp function. However, those information really was not used. So perhaps changing
def snps_match_dup(query,subject,keep_ambiguous=True):
pm = compare_snps(query,subject)
if not keep_ambiguous:
pm = pm[~pm.ambiguous]
new_subject = subject.loc[pm.sidx]
#update beta and snp info
new_query = pd.concat([new_subject.iloc[:,:5],query.loc[pm.qidx].iloc[:,5:]],axis=1)
new_query.loc[list(pm.flip) , "STAT"] = -new_query.STAT[list(pm.flip)]
return new_query, new_subject
into
def snps_match_dup(query,subject,keep_ambiguous=True):
pm = compare_snps(query.iloc[:,0:5],subject)
if not keep_ambiguous:
pm = pm[~pm.ambiguous]
new_subject = subject.loc[pm.sidx]
#update beta and snp info
new_query = pd.concat([new_subject.iloc[:,:5],query.loc[pm.qidx].iloc[:,5:]],axis=1)
new_query.loc[list(pm.flip) , "STAT"] = -new_query.STAT[list(pm.flip)]
return new_query, new_subject
can save us some mem