Rob Miller issues

Results 19 issues of


                                            Rob Miller

Move EMR deps to internal hosting

Our EMR scripts are currently dependent on external hosting (PyPi, RPMs, Scala/SBT, github), which we'd like to bring in house so we don't have bootstrap failures we can't control.

No module named pyspark

From https://bugzilla.mozilla.org/show_bug.cgi?id=1373631: Py4JJavaErrorTraceback (most recent call last) in () ----> 1 serialized_beta_full[1].count() /usr/lib/spark/python/pyspark/rdd.py in count(self) 1006 3 1007 """ -> 1008 return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() 1009...

ready

Audit EMR to check for dependency pinning

saveAsTextFile() + -getmerge returning empty file

From https://bugzilla.mozilla.org/show_bug.cgi?id=1373633 Sequence of events: In Spark: serialized_beta_full[1].saveAsTextFile("s3://net-mozaws-prod-us-west-2-pipeline-analysis/ekr/serialized-beta-full.out") In hadoop: hadoop fs -getmerge s3://net-mozaws-prod-us-west-2-pipeline-analysis/ekr/serialized-beta-full.out serialized-beta-full.out This claims to copy a lot of files, but the result is 0-length.

ready

Rob Miller

Move EMR deps to internal hosting

No module named pyspark

Audit EMR to check for dependency pinning

saveAsTextFile() + -getmerge returning empty file

Investigate Sentry / DataDog integration

Add support for fetching probes for a specific version

Add ability to fetch probes for a product and all dependency components in a single request

Add pagination support to probe fetch API

Support liblo-0.32