medium-spark-k8s
medium-spark-k8s copied to clipboard
Mounting does not work
After following all the steps, I am getting following exception:
Path does not exist: file:/opt/data-in/movies.csv;
I can confirm that volumes are correctly mounted to minikube. When I inspected driver pod's yaml in kubernetes, I dont see volumes entry their. Is helm chart upto date?
same problem.
volumes mounted into the minikube VM hostPath, but Pod does not mount volume from hostPath.
I guess the problem is related with CRD issue. Because, if I make sample mount with k8s object kind "Pod" with nginx image, this works fine. But if object kind goes to "CRD" it seems not working (same thing happened while applying gaffer-hdfs).
Please kindly advice if there is a way to solve the problem?
I face the same issue
21:14:25,039 WARN SparkContext:66 - The jar local:///opt/spark/jars/graphiq-transform-movie-ratings.jar has been added already. Overwriting of add │
│ Reading data from /mnt/data-in/ │
│ Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/mnt/data-in/movies.csv; │
│ at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558)2```
chartmuseum is not working even i made the following changes :
name="transform-movie-ratings"
rm -rf output/${name}
mkdir -p output/${name}
cp -r helm/ output/${name}/
cat helm/values-minikube.yaml >> output/${name}/values.yaml
cat helm/Chart.yaml >> output/${name}/Chart.yaml
cd output
export HELM_REPO_USE_HTTP="true"
helm repo add chartmuseum http://$(minikube ip):8080
helm cm-push ${name}/ chartmuseum
Then
./scripts/10-publish-chart.sh
"chartmuseum" has been added to your repositories
Pushing graphiq-transform-movie-ratings-0.1.tgz to chartmuseum...
Done.
After the repo update, i can see pushed to registry:
curl $(minikube -p test ip):8080/index.yaml
apiVersion: v1
entries:
graphiq-transform-movie-ratings:
- apiVersion: v1
appVersion: "0.1"
created: "2022-07-18T20:28:10.307675301Z"
description: Sample ETL Job for Medium Post
digest: 6ff53b72b09c6fb518004284dfc06c4fdb640fd1f70eab3b6957ec4861db3b14
home: http://bit.ly/spark-k8s
maintainers:
- email: [email protected]
name: Tom Lous
url: https://lous.info
name: graphiq-transform-movie-ratings
sources:
- https://github.com/TomLous/medium-spark-k8s
urls:
- charts/graphiq-transform-movie-ratings-0.1.tgz
version: "0.1"
generated: "2022-07-18T20:28:19Z"
serverInfo: {}
After
helm upgrade movie-ratings-transform \
chartmuseum/graphiq-transform-movie-ratings \
--namespace=spark-apps \
--install \
--force
Release "movie-ratings-transform" does not exist. Installing it now.
NAME: movie-ratings-transform
LAST DEPLOYED: Mon Jul 18 22:37:45 2022
NAMESPACE: spark-apps
STATUS: deployed
REVISION: 1
TEST SUITE: None
It says deployed but actually not...
So, i used,
helm upgrade movie-ratings-transform ./helm -f ./helm/values-minikube.yaml -n spark-apps --install --force
Release "movie-ratings-transform" has been upgraded. Happy Helming!
NAME: movie-ratings-transform
LAST DEPLOYED: Mon Jul 18 23:14:16 2022
NAMESPACE: spark-apps
STATUS: deployed
REVISION: 7
TEST SUITE: None
With this it creates the pods but driver crashed with following:
│ Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/mnt/data-in/movies.csv;
Any ideas ??