Internet.nl icon indicating copy to clipboard operation
Internet.nl copied to clipboard

Update sectxt to 0.9.0

Open bwbroersma opened this issue 2 years ago • 6 comments

DigitalTrustCenter/sectxt released 0.9.0 with has quite a few parser improvements, especially on PGP.

The only one I'm not sure about is the stripping of the BOM (https://github.com/DigitalTrustCenter/sectxt/issues/57#issuecomment-1663592300). I interpret the RFC 9116 - File Format Description and ABNF Grammar:

The file format of the "security.txt" file MUST be plain text (MIME type "text/plain") as defined in Section 4.1.3 of [RFC2046] and MUST be encoded using UTF-8 [RFC3629] in Net-Unicode form [RFC5198].

RFC 5198 states:

  1. Net-Unicode Definition The Network Unicode format (Net-Unicode) is defined as follows. Parts of this definition are deliberately informal, providing guidance for specific profiles or rules in the protocols that reference this one rather than firm rules that apply globally. … 5. As suggested in Section 6 of RFC 3629, the Byte Order Mark ("BOM") signature MUST NOT appear at the beginning of these text strings.

Especially in combination with signing maybe a :warning: warning or :information_source: notice should be shown. Although it's outside of the PGP block, a file with BOM is no longer recognized with file in Linux as a PGP signed file.

bwbroersma avatar Aug 16 '23 20:08 bwbroersma

I'll find out which new content labels we need.

mxsasha avatar Aug 22 '23 09:08 mxsasha

https://github.com/DigitalTrustCenter/sectxt/issues/65 is a blocker for this

mxsasha avatar Feb 15 '24 10:02 mxsasha

Content still needs to be checked: all labels in https://github.com/DigitalTrustCenter/sectxt/ readme need to be in our content too.

mxsasha avatar Apr 05 '24 09:04 mxsasha

Crappy one-liner check (formatted on 3 lines for readability :sweat_smile:):

$ diff \
   <(grep -oP '"\K[a-z0-9]+_[a-z0-9_]+(?=")' sectxt/sectxt/__init__.py | sort -u) \
   <(ls internet.nl_content/detail/tech/data/http-securitytxt/ | sed 's/_..\.md$//g' | sort -u)
1d0
< bom_in_file
5,6c4
< field_name
< invalid_cert
---
> expired
12c10
< invalid_uri_scheme
---
> location
26c24,25
< no_security_txt
---
> no_security_txt_404
> no_security_txt_other
31d29
< pgp_envelope
33a32,33
> requested-from
> retrieved-from
35a36
> utf8

At least for sure currently these are missing:

  • bom_in_file
  • invalid_cert
  • invalid_uri_scheme
  • pgp_envelope

At a manual inspection of sectxt I however see that invalid_uri_scheme and bom_in_file are in the SecurityTXT class, not in the Parser class that internet.nl uses. I'm don't see why bom_in_file is not checked in the Parser class. Created issue upstream:

  • https://github.com/DigitalTrustCenter/sectxt/issues/69

bwbroersma avatar Apr 05 '24 09:04 bwbroersma

Upstream solved it in the 0.9.3 release.

bwbroersma avatar Apr 09 '24 14:04 bwbroersma