Ono
Ono copied to clipboard
Feature Request: Not printing recursively for stringValue
I would like to have a method similar to stringValue which doesn't recursively prints everything under a certain XPathQuery. Here is the full code + HTML and the produced output by Ono
plus which output I'd like to have.
My XPath Query: XPathQuery: //div[@class='thread']
Ono
code:
document = [ONOXMLDocument HTMLDocumentWithData:file error:&error];
[document enumerateElementsWithXPath:xPath usingBlock:^(ONOXMLElement *element, NSUInteger idx, BOOL *stop) {
NSLog(@"%@", [element stringValue]);
}];
Which prints:
FirstName LastName, SecondNameFirst SecondNameLast
FirstName LastName
Wednesday, December 24, 2014 at 6:57pm UTC+01
This is a dummy text
SecondNameFirst SecondNameLast
Wednesday, December 24, 2014 at 6:56pm UTC+01
And a 2nd one just to show off
Another, User
Another
Monday, April 27, 2015 at 10:54pm UTC+02
Text: 2.1
User
Thursday, February 26, 2015 at 5:41pm UTC+01
Text: 2.2
Another
Thursday, February 26, 2015 at 4:25pm UTC+01
Text: 2.3
I would prefer to have an output similar to hpple which is:
FirstName LastName, SecondNameFirst SecondNameLast
Another, User
hpple code:
tutorialsParser = [TFHpple hppleWithHTMLData:file];
tutorialsNodes = [tutorialsParser searchWithXPathQuery:xPath];
for (TFHppleElement *element in tutorialsNodes) {
NSLog(@"%@", [[element firstChild] content].trim);
}
And I don't want to use hpple since it is too slow.
Here is my input HTML file:
<!DOCTYPE html>
<html>
<head><title/></head>
<body>
<div class="thread">FirstName LastName, SecondNameFirst SecondNameLast
<div class="message">
<div class="message_header">
<span class="user">FirstName LastName</span>
<span class="meta">Wednesday, December 24, 2014 at 6:57pm UTC+01 </span>
</div>
</div>
<p>This is a dummy text</p>
<div class="message">
<div class="message_header">
<span class="user">SecondNameFirst SecondNameLast</span>
<span class="meta">Wednesday, December 24, 2014 at 6:56pm UTC+01</span>
</div>
</div>
<p>And a 2nd one just to show off</p>
</div>
<div class="thread">Another, User
<div class="message">
<div class="message_header">
<span class="user">Another</span>
<span class="meta">Monday, April 27, 2015 at 10:54pm UTC+02</span>
</div>
</div>
<p>Text: 2.1</p>
<div class="message">
<div class="message_header">
<span class="user">User</span>
<span class="meta">Thursday, February 26, 2015 at 5:41pm UTC+01</span>
</div>
</div>
<p>Text: 2.2</p>
<div class="message">
<div class="message_header">
<span class="user">Another</span>
<span class="meta">Thursday, February 26, 2015 at 4:25pm UTC+01</span>
</div>
</div>
<p>Text: 2.3</p>
</div>
</body>
</html>
Sorry I don't speak Objective-C but you may use something like this.
extension String {
func trim() -> String {
return self.stringByTrimmingCharactersInSet(.whitespaceAndNewlineCharacterSet())
}
func clean() ->String {
return self.stringByReplacingOccurrencesOfString(
"\\s+",
withString: " ",
options: .RegularExpressionSearch)
}
}
Then in your code use it like below;
//remove extra spaces on left or right
let trimmedValue = (element.childrenWithTag("td")[3] as! ONOXMLElement).stringValue().trim()
//remove white space
let cleanedValue = (element.childrenWithTag("td")[3] as! ONOXMLElement).stringValue().clean()
//or chain them together
let extraCleanValue = (element.childrenWithTag("td")[3] as! ONOXMLElement).stringValue().clean().trim()
That wouldn't help since it would still be recursively print everything out