facets icon indicating copy to clipboard operation
facets copied to clipboard

facets dive too slow on Databricks

Open leandrohmvieira opened this issue 6 years ago • 5 comments

Just tested the example code here running on a Databricks environment and the result is very slow. It works well on Colab on the same machine.

Any ideas in how to improve DIVE performance on Databricks Notebook?

leandrohmvieira avatar Apr 09 '19 19:04 leandrohmvieira

I don't think any of us here have tried Databricks before, so no immediate thoughts. I can look into creating a Databricks account and testing it there to investigate.

jameswex avatar Apr 10 '19 13:04 jameswex

@leandrohmvieira how did you get Dive to display in databricks? I just tried the first two cells from the example code you linked to in a databricks environment and the Dive cell outputs "<IPython.core.display.HTML at 0x7f9b4db9b470>" as opposed to the actual visualization.

jameswex avatar Apr 17 '19 18:04 jameswex

Databricks has his own display method, just change the second cell's code to:


# Display the Dive visualization for the training data.
#from IPython.core.display import display, HTML
jsonstr = train_data.to_json(orient='records')
HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html">
        <facets-dive id="elem" height="600"></facets-dive>
        <script>
          var data = {jsonstr};
          document.querySelector("#elem").data = data;
        </script>"""
html = HTML_TEMPLATE.format(jsonstr=jsonstr)
#display(HTML(html))
displayHTML(html)

leandrohmvieira avatar Apr 18 '19 11:04 leandrohmvieira

Thanks @leandrohmvieira

When running that cell, instead of Dive I see an error "Uncaught TypeError: Cannot read property '' of undefined", which is a JS error inside Dive, but one that is expected in the current version of Dive and is not fatal and in other contexts doesn't stop Dive from displaying/working. Yet it seems to stop Databricks from displaying Dive. Do you see this as well?

jameswex avatar Apr 18 '19 13:04 jameswex

@jameswex yeah, but if i press "clear State & Results " and try a couple more times, i'm able to see Dive working.

Just to be sure if we are on the same environment, i'm working with:

  • Databricks on Azure
  • High concurrency cluster mode
  • Databricks Runtime Version 5.2 ML Beta (includes Apache Spark 2.4.0, Scala 2.11)
  • Python 3
  • Driver type Standard_DS4_v2 28.0 GB Memory, 8 Cores, 1.5 DBU
  • 2 Workers Standard_DS3_v2 14.0 GB Memory, 4 Cores, 0.75 DBU

leandrohmvieira avatar Apr 18 '19 13:04 leandrohmvieira