oboe.js icon indicating copy to clipboard operation
oboe.js copied to clipboard

Receiving multiple done events on single GET

Open James-Matthew-Watson opened this issue 11 years ago • 17 comments

I'm using oboe and appreciate the work that went into something so essential. I'm having a very good experience with it overall but I think I'm seeing behavior that should not be occurring. I could be wrong but I think I am using it properly.

Here's my oboe.js code:

oboe({
   url: "data/" + startTime + "/" + endtime,
   method: "GET",
   headers: {contentType: "application/json"},
   cache: false
  }).node('data.*', function(element) {
    // BLAH BLAH BLAH
    $("#status").text("retrieving: " + data.length + " records " + visible.length + " displayed");
  }).done(function(data){
    $("#status").text(data.length + " records retrieved " + visible.length + " displayed");
  });

The idea is that I want a different message to display after all the data is retrieved than when I am in the middle of the stream. The response from the restful service can be a bit jerky (data, pause, data, etc.) and I've had issues with both chrome and firefox "giving up" on really long responses part way through so I want to be able to tell when the stream is truly ended.

I thought the above would work but I the message will flip back and forth many times (dozens) before the end of the stream is reached. Am I misunderstanding what triggers the done event?

thanks,

-Matt

James-Matthew-Watson avatar Jul 09 '14 13:07 James-Matthew-Watson

Is it something I said?

James-Matthew-Watson avatar Sep 30 '14 01:09 James-Matthew-Watson

Ok, here's what I think is happening regarding the 'done' events...

Your JSON stream is actually many JSON objects concatenated into a file. Ie, it couldn't be read by a standard parser.

Oboe is designed to read a standard JSON resource as a stream. This means that any resource read by Oboe could also be read by standard tools.

'Multi-JSON' streams are something that I've got on the radar to add support for.

As for the pauses, I've also been thinking about this. On a fast network (or with the server running locally) and with gzipped resources the XHR can get all the content in one 'js turn'. This means that a pure js parser has a lot to parse all at once and can occupy the CPU for a noticeable amount of time.

I think the best solution is throttling so that Oboe takes many, short turns on the CPU rather than a single, longer one. I could implement this.

jimhigson avatar Oct 04 '14 09:10 jimhigson

I'm actually generating the data in question and I can change it but I'm not sure it fits your description. Here's an example. I've removed some of the fields from the internal objects for brevity:

{"data": [
{"queuedtimestamp": "2014-10-07T07:34:16.660Z", "delay": "13", "end": "2014-10-07T07:34:16.673Z"}
{"queuedtimestamp": "2014-10-07T07:34:17.200Z", "delay": "34", "end": "2014-10-07T07:34:17.411Z"}
]}

My assumption was that the pauses are actually on the source-side but it's possible that there client-side factors. The reason I say this is that I can see that it takes the root source (a 3rd-party tool) takes a long time to return the entire root data set (network latency is not a concern here.)

A few other pieces might help you understand the situation. Initially I had tried pulling the entire root dataset down and aggregating on the client. While this was OK for small sets of data, when I would try the real use cases I wanted to support, the browser would fall over and die (I've tried several they provide no specific error messages.) I then moved the aggregation to an intermediate server that will collect the data and do the aggregation server side. This improves things but I still see issues on the browser.

It could be my lack of experience in browser-side work but I have a feeling I am pushing the browser beyond it's capabilities. I am capturing as many as 500,000 events and creating say, 250,000 individual SVG elements from a single request.

So if it's not the structure of the data, is there perhaps something in oboe that is timing out because the stream appears to have stopped producing data? I'm not getting done events on every object. For example if I pull down 100,000 JSON objects in the stream, I might see 50-100 'done' events generated from oboe and they seem to align with the pauses.

I appreciate your help with this and can provide more detail as needed.

thanks,

-Matt

James-Matthew-Watson avatar Oct 07 '14 14:10 James-Matthew-Watson

So when request a large swath of data from the REST end-point directly int the browser and I see pauses. So it's either the browser itself or the server that the source of the pauses. It doesn't appear to be oboe.

James-Matthew-Watson avatar Oct 07 '14 14:10 James-Matthew-Watson

So I still have an issue here. Is there something I can provide to show that this is not a multiple item issue?

James-Matthew-Watson avatar Jan 14 '15 20:01 James-Matthew-Watson

@James-Matthew-Watson @jimhigson Having the same issue with multiple done's for one GET while streaming from nodejs. The response looks like this:

JSON

it seems that for every line of this JSON done is called.

artworkad avatar Mar 09 '15 09:03 artworkad

+1 I'm actually running into the same docker problem as @ArtworkAD. Support for "Multi-JSON" would be awesome. Here's a dump of the type of stuff I'm trying to parse https://gist.github.com/robertsheehy-wf/0bb14c45393c94f7c976.

robertsheehy-wf avatar Mar 19 '15 21:03 robertsheehy-wf

@ArtworkAD

Done method is called when ever you received an complete object, so it's called many time. Redesign your api response to avoid this.

This is a future not a bug :)

nhducit avatar May 18 '15 05:05 nhducit

I'd appreciate the feature to handle multi-json responses, redesigning the API isn't an option for everyone.

kevana avatar May 18 '15 06:05 kevana

I'm hitting the same issue reading from a file with multiple JSON objects concatenated in it; I understand the logic behind expecting a valid one-object JSON blob, but think that expanding Oboe to handle the multi-object case has more pros than cons.

In case it's useful, I made a simple repo demonstrating this behavior, mostly for my own understanding: https://github.com/ryan-williams/oboe-test.

Also FWIW, the multi-object "JSON" files I'm consuming are Spark's event log files.

Oboe clearly already understands that the read stream remains open after the first object is finished, and it correctly handles subsequent top-level objects, so the question seems to be whether the semantics of done should be "a top-level object is complete" vs. "the read stream is consumed".

ryan-williams avatar Nov 06 '15 16:11 ryan-williams

@ryan-williams I also have confusing when using oboe with multi-object JSON. How can I know when the request is finish?

nhducit avatar Jan 16 '16 02:01 nhducit

Unfortunately I think I worked around this by adding some caller code (outside of Oboe) that wrapped my JSON objects in a JSON array (and added commas between them); of course, this loses the streaming capabilities of Oboe :(

ryan-williams avatar Jan 16 '16 03:01 ryan-williams

If you are able to modify your API a simple solution for this could be sending something like null just before the stream is drained and then catch it in oboe:

// back-end
myReadableStream.push( 'null' );
myReadableStream.push( null );

// front-end
oboe.node( '!', result => {
  if ( result === null ) {
    // stream is drained
  } else {
    // stream is still alive
  }
});

glortho avatar Aug 16 '16 19:08 glortho

Hey guys, great effort on the lib. I've run into a related issue I think, so I will explain what I see. I am returning an array of elements from my server. I made this array by concatenating a few files. So lets say I have an array with 40K objects. I would expect a single done event to fire when the array is completed. Instead, the done event fires multiple times. I noticed that because of my concatenation I had a new line character every time I added more objects to the array. I removed the newline characters, now I only see one done event raised once the array completes. So I think that the newlines caused a done event to be raised.

Again I don't have multiple objects sent over without a root object, I have an array of objects that should only raise 1 done event.

dcastellanos-r7 avatar Oct 28 '16 03:10 dcastellanos-r7

i still experience this here - has this been solved yet?

binarykitchen avatar Apr 23 '17 07:04 binarykitchen

For ndjson stream that has many json objects separated by newlines but not contained in a top level object the following pattern worked for me:

	}).node("!", function(data) {
		all.push(data);
		itemCallback(data);
	}).on("end", function(data) {
		completeCallback(all);
	})

tailuge avatar May 27 '18 07:05 tailuge

If your data is multi-json (also seen as ndjson or jsonl) why would you use Oboe? would it be better to just use readline and then JSON.parse() it?

chris-heathwood-uoy avatar Oct 09 '20 10:10 chris-heathwood-uoy