Cassandre icon indicating copy to clipboard operation
Cassandre copied to clipboard

Import a forum thread into Cassandre

Open benel opened this issue 14 years ago • 0 comments

  • The original page URL could be saved into Cassandre as an alternative resource.
  • Neither phpBB nor Doct***o produces well-formed xHTML. Therefore we cannot use xpath nor XSLT. There could be a set of regular expressions for each kind of forum.

Matthieu implemented forums parsers that might be reused:

    $p_post_seperator = '/class="messagetable"/u',
    $p_thread_title = '/Sujet : <h3>(.*)<\/h3>/u',
    $p_date = '/le (\d\d-\d\d-\d\d\d\d).*(\d\d:\d\d:\d\d)/u',
    $p_author = '/<b class="s2">(.*?)<\/b>/u',
    $p_id = '/<a name="t(\d+)"><\/a>/u',
    $p_msg = '/<div id="para\d+">(.*)<\/div><\/td><\/tr>/u',
    $p_end_thread = '/<script language="javascript" type="text\/javascript">var listenumreponse/u',
    $p_nextpg = '/<div class="pagepresuiv"><a href="(.*)" class="cHeader" accesskey="x">Page Suivante<\/a>/u'

benel avatar Feb 16 '11 18:02 benel