Frames icon indicating copy to clipboard operation
Frames copied to clipboard

join 2 frames of same type

Open teto opened this issue 3 years ago • 2 comments

I have 2 frames that contain network packet arguments (source IP, destination IP and so on), one frame captured at the source, one at the destination. In my python software (now rewriting in haskell), I mapped the 2 dataframes via a hash of the packet. Which gives:

-- generate a column with a hash of other columns
addHash :: FrameFiltered Packet -> Frame (Record '[PacketHash] )
addHash aframe =
  fmap (addHash')  (frame)
  where
    frame = fmap toHashablePacket (ffFrame aframe)
    addHash' row = Col (hashWithSalt 0 row) :& RNil

-- here frame1 and frame2 have the same type
mergeTcpConnectionsFromKnownStreams frame1 frame2 =
  mergedFrame
  where
    mergedFrame = innerJoin @'[PacketHash] ( hframe1) ( hframe2)
    hframe1 = zipFrames (addHash aframe1) frame1
    hframe2 = zipFrames (addHash aframe1) frame2

It compiles and it seems to run but after the innerJoin, there should be several columns with the same name. Doesn't that break the API somewhat ? how can I select the source IP between the 2 sourceIP present in the merged dataframe for instance ?

teto avatar Mar 21 '21 22:03 teto

I finally managed to set the types and indeed we concatenate columns with same names (The following compiles):

mergeTcpConnectionsFromKnownStreams :: 
  FrameFiltered Packet -> FrameFiltered Packet
  -> [ Rec (Maybe :. ElField) ('[PacketHash] ++ ManColumnsTshark ++ ManColumnsTshark) ]

I then serialized the result via writeCsv and because of https://github.com/acowley/Frames/issues/155 the results are messed up so I can't interpret them yet but I see column with the same names.

I also noticed a bug on my side: I was doing a join on the PacketHash column but all my hashes were equal to 0 (now fixed). All packets got paired 1 to 1 with the same hash. The beahvior is strange/wrong so maybe we could add a function that adds some check or specify the beahviour when several rows are candidates for a merge ?

teto avatar Apr 04 '21 00:04 teto

This seems related: https://github.com/acowley/Frames/issues/170#issuecomment-1523492615. I just rename columns and then do the inner join.

idontgetoutmuch avatar May 01 '23 06:05 idontgetoutmuch