sparklens icon indicating copy to clipboard operation
sparklens copied to clipboard

Sparklens for streaming

Open dominikabasaj opened this issue 7 years ago • 4 comments
trafficstars

Hi,

Are there any plans to adjust Sparklens for streaming processing? I assume that right now it is suitable only for batch processes?

Best, Dominika

dominikabasaj avatar Jun 06 '18 06:06 dominikabasaj

Thanks for bringing this up @dominikabasaj. This is definitely on the radar and we will be adding support for Streaming. I will encourage you to wear a PM hat and help us define the requirements/use cases/etc around this feature. This will help us validate what we are thinking and makes sure you get what you are looking for in this feature. CC: @itsvikramagr

iamrohit avatar Jun 06 '18 08:06 iamrohit

@dominikabasaj

Here is one way to get it working with streaming job. I haven't tried it with streaming yet. Let me know if this serves your purpose.

1.Start your application with --packages qubole:sparklens:0.1.2-s_2.11 but don't specify the extraListener config. 2. As part of your application, do the following:

import com.qubole.sparklens.QuboleNotebookListener
val QNL = new QuboleNotebookListener(sc.getConf)
sc.addSparkListener(QNL)

Basically, create a listener(note that this is Notebook listener and not JobListener) and register it. 3. within your streaming function (whatever is repeatedly called), wrap your code in the following:

QNL.profileIt {
    //Your code here
}

Alternatively, if you need more control:

if (QNL.estimateSize() > QNL.getMaxDataSize()) {
  QNL.purgeJobsAndStages()
}
val startTime = System.currentTimeInMillis
<-- Your scala code here -->
endTime = System.currentTimeInMillis
//wait for some time to get all events to accumulate 
Thread.sleep(QNL.getWaiTimeInSeconds())
println(QNL.getStats(startTime, endTime))
  1. Checkout https://github.com/qubole/sparklens/blob/master/src/main/scala/com/qubole/sparklens/QuboleNotebookListener.scala for more information.

thanks!

iamrohit avatar Jun 13 '18 14:06 iamrohit

Sorry for duplicating, but this issue is also related to streaming, so just thought of updating.

We have tried using QuboleJobListener for structured streaming , but it will only provide reports after terminating the streaming query and also it provides for all the Jobs together (not batch wise)

But in general, as these Structured streaming applications are continuously running, users/developers will be interested to see stats for every few batches.

Detailed proposal is attached as below. Please review and provide your inputs.

Structured_streaming_sparklens.pdf

akumarb2010 avatar Jun 14 '18 21:06 akumarb2010

@dominikabasaj @akumarb2010 You can check out our new project Streaminglens if you plan to use Sparklens for Streaming applications.

abhishekd0907 avatar Jan 27 '20 03:01 abhishekd0907