feast
feast copied to clipboard
Spark Offline Store-- 如何配置spark source?
我理解的配置如下后: project: feast_spark_project registry: data/registry.db provider: local offline_store: type: spark spark_conf: spark.master: yarn spark.ui.enabled: "true" spark.eventLog.enabled: "true" spark.sql.catalogImplementation: "hive" spark.sql.parser.quotedRegexColumnNames: "true" spark.sql.session.timeZone: "UTC"
然后使用如下代码定义sparksource from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import ( SparkSource,)
driver_hourly_stats= SparkSource( name="driver_hourly_stats", query="SELECT event_timestamp as ts, created_timestamp as created, conv_rate,conv_rate,conv_rate FROM emr_feature_store.driver_hourly_stats", event_timestamp_column="ts", created_timestamp_column="created" )
再定义FeatureView
training_df = store.get_historical_features( entity_df=entity_df, features=[ "driver_new_stats:new_conv_rate", "driver_new2_stats:new_conv_rate", ], full_feature_names=True, ).to_df()
我想问的是: 运行get_historical_features,是如何连接到spark 集群中,来提交任务的?
现在我提交spark任务都是通过spark-submit来提交任务,所以不太明白配置Spark Offline Store,是怎么提交任务的?
From this document: https://docs.feast.dev/reference/offline-stores/spark
you can configure the "spark.master" to your spark cluster's IP & Port address.
project: my_project
registry: data/registry.db
provider: local
offline_store:
type: spark
spark_conf:
spark.master: "local[*]"
spark.ui.enabled: "false"
spark.eventLog.enabled: "false"
spark.sql.catalogImplementation: "hive"
spark.sql.parser.quotedRegexColumnNames: "true"
spark.sql.session.timeZone: "UTC"
online_store:
path: data/online_store.db
You can read the code: https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark_source.py
to understand how Feast talks with Spark.
@shuchu
From this document: https://docs.feast.dev/reference/offline-stores/spark
you can configure the "spark.master" to your spark cluster's IP & Port address.
project: my_project registry: data/registry.db provider: local offline_store: type: spark spark_conf: spark.master: "local[*]" spark.ui.enabled: "false" spark.eventLog.enabled: "false" spark.sql.catalogImplementation: "hive" spark.sql.parser.quotedRegexColumnNames: "true" spark.sql.session.timeZone: "UTC" online_store: path: data/online_store.db
如果现有的spark集群上只支持使用spark-submit方式来提交任务, 那如何使用feast的功能呢?
I would say "spark-submit" will not work with the current implementation.
In Chinese: feast现在不支持 spark-submit. 除非有人愿意写一个 新的 spark offline storage based on spark-submit.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.