Frames
Frames copied to clipboard
join 2 frames of same type
I have 2 frames that contain network packet arguments (source IP, destination IP and so on), one frame captured at the source, one at the destination. In my python software (now rewriting in haskell), I mapped the 2 dataframes via a hash of the packet. Which gives:
-- generate a column with a hash of other columns
addHash :: FrameFiltered Packet -> Frame (Record '[PacketHash] )
addHash aframe =
fmap (addHash') (frame)
where
frame = fmap toHashablePacket (ffFrame aframe)
addHash' row = Col (hashWithSalt 0 row) :& RNil
-- here frame1 and frame2 have the same type
mergeTcpConnectionsFromKnownStreams frame1 frame2 =
mergedFrame
where
mergedFrame = innerJoin @'[PacketHash] ( hframe1) ( hframe2)
hframe1 = zipFrames (addHash aframe1) frame1
hframe2 = zipFrames (addHash aframe1) frame2
It compiles and it seems to run but after the innerJoin, there should be several columns with the same name. Doesn't that break the API somewhat ? how can I select the source IP between the 2 sourceIP present in the merged dataframe for instance ?
I finally managed to set the types and indeed we concatenate columns with same names (The following compiles):
mergeTcpConnectionsFromKnownStreams ::
FrameFiltered Packet -> FrameFiltered Packet
-> [ Rec (Maybe :. ElField) ('[PacketHash] ++ ManColumnsTshark ++ ManColumnsTshark) ]
I then serialized the result via writeCsv
and because of https://github.com/acowley/Frames/issues/155 the results are messed up so I can't interpret them yet but I see column with the same names.
I also noticed a bug on my side: I was doing a join on the PacketHash column but all my hashes were equal to 0 (now fixed). All packets got paired 1 to 1 with the same hash. The beahvior is strange/wrong so maybe we could add a function that adds some check or specify the beahviour when several rows are candidates for a merge ?
This seems related: https://github.com/acowley/Frames/issues/170#issuecomment-1523492615. I just rename columns and then do the inner join.