PRONOM_Research icon indicating copy to clipboard operation
PRONOM_Research copied to clipboard

x-fmt/80 (Mac Pict) and fmt/1427 (Mac Draw 2) signatures overlap

Open Dclipsham opened this issue 3 years ago • 5 comments

Signatures for these two formats currently overlap. x-fmt/80: 44525747(4D44|4432){516}1101

fmt/1427: 44525747(0000|4432)

So an x-fmt/80 with 'D2' following DRWG header will necessarily also match fmt/1427. This is evident in the file 'DOODLE.PCT' found in the EDRM 1.0 dataset (https://edrm.net/resources/data-sets/#1598455996696-88a3bd82-aedf) which is currently getting dual identification outcome based on signature.

Not yet sure the best resolution so just raising an issue for now...

CC @thorsted

Dclipsham avatar Jul 22 '22 13:07 Dclipsham

Hmm, I went back at looked at my original submission and there might have been a segment missed for fmt/1427.

<InternalSignature ID="3" Specificity="Specific">
      <ByteSequence Reference="BOFoffset">
        <SubSequence MinFragLength="0" Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0">
          <Sequence>44525747</Sequence>
          <DefaultShift>5</DefaultShift>
          <Shift Byte="44">4</Shift>
          <Shift Byte="52">3</Shift>
          <Shift Byte="57">2</Shift>
          <Shift Byte="47">1</Shift>
          <RightFragment MaxOffset="0" MinOffset="0" Position="1">0000</RightFragment>
          <RightFragment MaxOffset="0" MinOffset="0" Position="1">4432</RightFragment>
          <RightFragment MaxOffset="0" MinOffset="0" Position="2">0000</RightFragment>
        </SubSequence>
      </ByteSequence>
    </InternalSignature>

The current signature doesn't have that second position fragment. I believe I added it to protect it from some similarities in other versions, but this was one of my earlier signatures, any advice would be appreciated. I remember @jayGattusoNLNZ also used the additional zero's on the signature he developed around the same time. Might warrant a second look.

thorsted avatar Jul 22 '22 15:07 thorsted

@Dclipsham @thorsted should this be picked up by the skeleton suite? Do you have any insight into why it isn't?

ross-spencer avatar Jul 24 '22 12:07 ross-spencer

From a quick glance at the v97 suite (not the most recent, but just readily available), fmt/1427 whole file is '44 52 57 47 00 00', and x-fmt/80 begins '44 52 57 47 4D 44' so neither include the 0x44 32 at offset 0x04-05

Dclipsham avatar Jul 24 '22 12:07 Dclipsham

@Dclipsham @thorsted not an ideal solution but maybe in time for the next release before a closer look can happen but should I prioritise x-fmt/80 over fmt/1427 for v.108? Unless this issue has already been picked up on

tnafrancesca avatar Aug 26 '22 13:08 tnafrancesca

I'm not hugely au fait with the formats yet so can't make that call right now I'm afraid.

Dclipsham avatar Aug 26 '22 13:08 Dclipsham