medium-spark-k8s icon indicating copy to clipboard operation
medium-spark-k8s copied to clipboard

Mounting does not work

Open kpritam opened this issue 4 years ago • 3 comments

After following all the steps, I am getting following exception: Path does not exist: file:/opt/data-in/movies.csv;

I can confirm that volumes are correctly mounted to minikube. When I inspected driver pod's yaml in kubernetes, I dont see volumes entry their. Is helm chart upto date?

kpritam avatar Jan 18 '21 14:01 kpritam

same problem.

volumes mounted into the minikube VM hostPath, but Pod does not mount volume from hostPath.

I guess the problem is related with CRD issue. Because, if I make sample mount with k8s object kind "Pod" with nginx image, this works fine. But if object kind goes to "CRD" it seems not working (same thing happened while applying gaffer-hdfs).

Please kindly advice if there is a way to solve the problem?

jyyoo0530 avatar Feb 26 '21 08:02 jyyoo0530

I face the same issue

21:14:25,039  WARN SparkContext:66 - The jar local:///opt/spark/jars/graphiq-transform-movie-ratings.jar has been added already. Overwriting of add │
│ Reading data from /mnt/data-in/                                                                                                                     │
│ Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/mnt/data-in/movies.csv;                               │
│     at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558)2```

faaizshah avatar Jul 18 '22 21:07 faaizshah

chartmuseum is not working even i made the following changes :

name="transform-movie-ratings"

rm -rf output/${name}
mkdir -p output/${name}
cp -r helm/ output/${name}/
cat helm/values-minikube.yaml >> output/${name}/values.yaml
cat helm/Chart.yaml >> output/${name}/Chart.yaml
cd output

export HELM_REPO_USE_HTTP="true"
helm repo add chartmuseum http://$(minikube ip):8080
helm cm-push ${name}/ chartmuseum

Then

./scripts/10-publish-chart.sh                               
"chartmuseum" has been added to your repositories
Pushing graphiq-transform-movie-ratings-0.1.tgz to chartmuseum...
Done.

After the repo update, i can see pushed to registry:

curl $(minikube -p test ip):8080/index.yaml                                                                                             
apiVersion: v1
entries:
  graphiq-transform-movie-ratings:
  - apiVersion: v1
    appVersion: "0.1"
    created: "2022-07-18T20:28:10.307675301Z"
    description: Sample ETL Job for Medium Post
    digest: 6ff53b72b09c6fb518004284dfc06c4fdb640fd1f70eab3b6957ec4861db3b14
    home: http://bit.ly/spark-k8s
    maintainers:
    - email: [email protected]
      name: Tom Lous
      url: https://lous.info
    name: graphiq-transform-movie-ratings
    sources:
    - https://github.com/TomLous/medium-spark-k8s
    urls:
    - charts/graphiq-transform-movie-ratings-0.1.tgz
    version: "0.1"
generated: "2022-07-18T20:28:19Z"
serverInfo: {}

After

helm upgrade movie-ratings-transform \
 chartmuseum/graphiq-transform-movie-ratings \
 --namespace=spark-apps \
 --install \
 --force
Release "movie-ratings-transform" does not exist. Installing it now.
NAME: movie-ratings-transform
LAST DEPLOYED: Mon Jul 18 22:37:45 2022
NAMESPACE: spark-apps
STATUS: deployed
REVISION: 1
TEST SUITE: None

It says deployed but actually not...

So, i used,

helm upgrade movie-ratings-transform ./helm -f ./helm/values-minikube.yaml -n spark-apps --install --force

Release "movie-ratings-transform" has been upgraded. Happy Helming!
NAME: movie-ratings-transform
LAST DEPLOYED: Mon Jul 18 23:14:16 2022
NAMESPACE: spark-apps
STATUS: deployed
REVISION: 7
TEST SUITE: None

With this it creates the pods but driver crashed with following:

│ Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/mnt/data-in/movies.csv;

Any ideas ??

faaizshah avatar Jul 18 '22 21:07 faaizshah