node-osmosis
node-osmosis copied to clipboard
Process JSON result
Hey,
While osmosis can be used greatly to scrape html pages or xml, I could not find a way to get or .set()
data of a JSON response. From my understanding, osmosis always tries to parse the response of a .get()
command assuming always it is a form of xml text. If this is the case, then can i just get the raw body response of the .get()
command. I attempted to do this via .then()
command, but it didn't work:
.then(function (context, data) {
responseJson = context.text(); // or JSON.parse(context.text()); // both are error
})
Any idea how to get the json response?
Or automatically parse as json if the response header contains Content-Type: application/json
.
If this is not implemented already, may i suggest to add this feature.
I am hoping to implement this in a future version. The challenge will be doing so without breaking the current API.
I'd like the context
object to become more of a "response" object, providing the parsed JSON or HTML context, the raw data, HTTP query information, and any errors.
That would be great to support:
providing the parsed JSON or HTML context, the raw data, HTTP query information, and any errors
To avoid breaking the current API, you can make response
of needle as a sub-object of context.
So the following will be possible:
context.body // or context.originalResponse
Also, to avoid storing the original response object which might not always be required, you might provide an option to let the user explicitly determine whether she wants to access the original response object or not in .then()
command.
If this is not very convincing, I am sure you will find other smart ways to achieve this. I just hope that the future version that will support this will not be very far from now. :relieved:
For anyone else trying to get JSON output, I hacked it together like this:
const browser = osmosis.get('./foobar.json')
.find('p')
.set('json')
.then(function(context, data) {
const result = JSON.parse(data.json);
console.log(result);
});
@cobbweb Nice trick. I'm using too.
But there is another trouble. I can't get part of the document
as a raw data. I need to do something like this
<div id="info">
<p><span>Number of:</span>2</p>
<p><a href="http://a.b.ru?ID=622956" title="title1">LOT OF RTR</a></p>
<p><a href="http://c.d.ru/e-auction.xhtml?parm=2" title="title2">LOT OF ERT</a></p>
<p><span>Place of</span><a href="/2219">RD HOUSE</a></p>
<p>Local RD HOUSE NUMBER - 123456</p>
</div>
.do(osmosis.find('//*[@id="info"]/p')
// The element may or may not be present
.contains('LOT OF RTR')
.set({
a: 'a @href',
content: 'a.title'
})
.then(function(context, data){
console.log('context', context);
console.log('data', data);
})
)
The raw of this would be prefer.