bigbang
bigbang copied to clipboard
collect_mail.py from IETF collects empty .mail files
Something is quite wrong with the IETF data collection procees.
$ python bin/collect_mail.py -u https://www.ietf.org/mail-archive/text/dns-security/
['2008-05.mail',
'2008-06.mail',
'2008-07.mail',
'2008-08.mail',
'2008-09.mail',
'2008-10.mail',
'2008-11.mail',
'2008-12.mail']
So far so good, but then:
archives/dns-security/2008-09.mail (END)
So no data is getting collected.
The mail collection script is downloading all the .mail files from this page:
https://www.ietf.org/mail-archive/text/dns-security/
But these .mail files are empty; the data is actually in the .txt files
This is likely related to the fact that dns-security is a deprecated working group
https://www.ietf.org/mail-archive/text/dns-security/dns-security.200003.txt