node-osmosis icon indicating copy to clipboard operation
node-osmosis copied to clipboard

Process JSON result

Open nashwaan opened this issue 7 years ago • 4 comments

Hey,

While osmosis can be used greatly to scrape html pages or xml, I could not find a way to get or .set() data of a JSON response. From my understanding, osmosis always tries to parse the response of a .get() command assuming always it is a form of xml text. If this is the case, then can i just get the raw body response of the .get() command. I attempted to do this via .then() command, but it didn't work:

.then(function (context, data) {
    responseJson = context.text(); // or JSON.parse(context.text());     // both are error
})

Any idea how to get the json response? Or automatically parse as json if the response header contains Content-Type: application/json.

If this is not implemented already, may i suggest to add this feature.

nashwaan avatar Aug 30 '16 12:08 nashwaan

I am hoping to implement this in a future version. The challenge will be doing so without breaking the current API.

I'd like the context object to become more of a "response" object, providing the parsed JSON or HTML context, the raw data, HTTP query information, and any errors.

rchipka avatar Aug 30 '16 14:08 rchipka

That would be great to support:

providing the parsed JSON or HTML context, the raw data, HTTP query information, and any errors

To avoid breaking the current API, you can make response of needle as a sub-object of context. So the following will be possible:

context.body // or context.originalResponse

Also, to avoid storing the original response object which might not always be required, you might provide an option to let the user explicitly determine whether she wants to access the original response object or not in .then() command.

If this is not very convincing, I am sure you will find other smart ways to achieve this. I just hope that the future version that will support this will not be very far from now. :relieved:

nashwaan avatar Aug 31 '16 03:08 nashwaan

For anyone else trying to get JSON output, I hacked it together like this:

const browser = osmosis.get('./foobar.json')
  .find('p')
  .set('json')
  .then(function(context, data) {
    const result = JSON.parse(data.json);
    console.log(result);
  });

cobbweb avatar Feb 06 '17 09:02 cobbweb

@cobbweb Nice trick. I'm using too.

But there is another trouble. I can't get part of the document as a raw data. I need to do something like this

<div id="info">
  <p><span>Number of:</span>2</p>
  <p><a href="http://a.b.ru?ID=622956" title="title1">LOT OF RTR</a></p>
  <p><a href="http://c.d.ru/e-auction.xhtml?parm=2" title="title2">LOT OF ERT</a></p>
  <p><span>Place of</span><a href="/2219">RD HOUSE</a></p>
  <p>Local RD HOUSE NUMBER - 123456</p>
</div>
  .do(osmosis.find('//*[@id="info"]/p')
    // The element may or may not be present
    .contains('LOT OF RTR')
    .set({
      a: 'a @href',
      content: 'a.title'
    })
    .then(function(context, data){
      console.log('context', context);
      console.log('data', data);
    })
  )

The raw of this would be prefer.

oshliaer avatar Feb 10 '17 10:02 oshliaer