fuzzyjoin
fuzzyjoin copied to clipboard
Not all matches returned using regex_left_join
I have two data frames. I need to merge them based on a partial string match.
Data frame A has Gene.Name column with EHBP1.
Data frame B has Gene.Symbols column with
CLEC7A,EHBP1
CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1,HTR2A
MBL2,CLEC7A,EHBP1,HTR2A
EHBP1,HTR2A
EHBP1,HTR2A
MBL2,CLEC7A,EHBP1,HTR2A
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
TBX15,MBL2,SNORD54,CLEC7A,RREB1,MRPL51,GGTLC2,MIR30A,SETMAR,GFOD1,STK33,KHDRBS2,EHBP1,RCL1,HTR2A
When I run the following command:
mydata <- regex_left_join(A, B, by = c(Gene.Name = "Gene.Symbols"))
Only some of the matches are returned. I get only these matches:
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
EHBP1
Why am I not getting these remaining matches?
MBL2,CLEC7A,EHBP1,HTR2A
CLEC7A,EHBP1
CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1
MBL2,CLEC7A,EHBP1,HTR2A
EHBP1,HTR2A
EHBP1,HTR2A
MBL2,CLEC7A,EHBP1,HTR2A
TBX15,MBL2,SNORD54,CLEC7A,RREB1,MRPL51,GGTLC2,MIR30A,SETMAR,GFOD1,STK33,KHDRBS2,EHBP1,RCL1,HTR2A
Because the regex expression should be on the right, you might need :
mydata <- regex_right_join(B, A, by = c(Gene.Symbols = "Gene.Name"))