zero323

Results 26 issues of zero323

``` gendiscrete(10, 1:4, rep(0.25, 4)) ```

bug

[spark-deep-learning](https://github.com/databricks/spark-deep-learning) is actively developed, and has some nice code under the covers.

`dlt` (https://dlt.zero323.net) is a SparkR API for Delta Lake.

to-consider

It seems like [pyjanitor](https://github.com/ericmjl/pyjanitor) > pyjanitor is a Python implementation of the R package janitor, and provides a clean API for cleaning data. provides some Spark utilities: - https://github.com/ericmjl/pyjanitor/tree/dev/janitor/spark -...

to-consider

Dagster is a > data orchestrator for machine learning, analytics, and ETL which provides, among others [PySpark API](https://docs.dagster.io/_apidocs/libraries/dagster_pyspark).

to-consider

In case we're going to merge #173 should we consider adding RapidMiner [Radoop](https://rapidminer.com/products/radoop/) and friends. These provide at least similar if not higher exposure of Spark API.

As far as I am aware there are at least three active certification programs: - [CCA Spark and Hadoop Developer Exam (CCA175)](https://www.cloudera.com/more/training/certification/cca-spark.html) - [
Databricks Certified Developer:
 Apache Spark™](https://databricks.com/training/certified-spark-developer) -...

question

https://databricks.com/blog/category/engineering I am not strongly advocating for that, but there pretty neat posts there from time to time, if you filter out marketing stuff.

later
to-consider