presto-yarn
presto-yarn copied to clipboard
Unable to add a custom dependency to plugin (connector)
Go here to see more details: https://groups.google.com/forum/#!msg/presto-users/7DytyTsRG0Y/-mcXA3PkBgAJ
Hello, I've recently stood up presto-yarn. I've run into a problem while querying a table that makes use of a custom serde (in this case com.proofpoint.hive.serde.JsonSerde).
I've followed the documentation to add the jar (hive-serde-1.0.jar) to the hive connecter (hive-cdh5) and have configured appConfig.json accordingly (see below). It all checks out, I can see that the jar lives on the worker nodes, after submitting the application to yarn via slider. However, I get a "deserializer does not exist" when I query the table.
Any help you can offer is appreciated.
Thanks,
Rob
Error when trying to query table:
presto> use hive.default;
presto:default> select count(*) from log where ds='2016-11-03' and ts = '08-00';
Query 20170123_220008_00006_ae3ej, FAILED, 4 nodes
Splits: 31 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20170123_220008_00006_ae3ej failed: deserializer does not exist: com.proofpoint.hive.serde.JsonSerde
See jar on worker node:
[(test) [email protected] plugin]# ls -al /home/presto/data/plugin/hive-cdh5/ | grep -i serde
-r-x------ 1 presto yarn 2894794 Jan 23 14:44 hive-serde-1.0.jar
See missing class in jar:
[(test) [email protected] plugin]# pwd
/home/presto/data/plugin
[(test) [email protected] plugin]# grep -ir com.proofpoint.hive.serde.JsonSerde ./
Binary file ./hive-cdh5/hive-serde-1.0.jar matches
See appConfig.json:
{
"schema": "http://example.org/specification/v2.0.0",
"metadata": {
},
"global": {
"site.global.app_user": "yarn",
"site.global.user_group": "hadoop",
"site.global.data_dir": "/home/presto/data",
"site.global.config_dir": "/home/presto/lib/presto/etc",
"site.global.app_name": "presto-server-0.152",
"site.global.app_pkg_plugin": "${AGENT_WORK_ROOT}/app/definition/package/plugins/",
"site.global.singlenode": "true",
"site.global.coordinator_host": "${COORDINATOR_HOST}",
"site.global.presto_query_max_memory": "50GB",
"site.global.presto_query_max_memory_per_node": "600MB",
"site.global.presto_server_port": "8080",
"site.global.plugin": "{'hive-cdh5': ['hive-serde-1.0.jar']}",
"site.global.catalog": "{'tpch': ['connector.name=tpch'], 'hive': ['connector.name=hive-cdh5','hive.metastore.uri=thrift://host.hadoop.test.com:9083','hive.metastore.authentication.type=KERBEROS','hive.metastore.service.principal=hive/[email protected]','[email protected]','hive.metastore.client.keytab=/etc/keytabs/presto.keytab','hive.hdfs.authentication.type=KERBEROS','[email protected]','hive.hdfs.presto.keytab=/etc/keytabs/presto.keytab','hive.config.resources=/opt/cloudera/hadoop-conf/core-site.xml,/opt/cloudera/hadoop-conf/hdfs-site.xml,/opt/cloudera/hadoop-conf/hive-site.xml']}",
"site.global.jvm_args": "['-server', '-Xmx1024M', '-XX:+UseG1GC', '-XX:G1HeapRegionSize=32M', '-XX:+UseGCOverheadLimit', '-XX:+ExplicitGCInvokesConcurrent', '-XX:+HeapDumpOnOutOfMemoryError', '-XX:OnOutOfMemoryError=kill -9 %p']",
"application.def": ".slider/package/PRESTO/presto-yarn-package-1.4-SNAPSHOT-0.152.zip",
"java_home": "/usr/lib/jvm/java"
},
"components": {
"slider-appmaster": {
"jvm.heapsize": "128M"
}
},
"coordinator": {
"http.server.authentication.enabled" : "true"
}
}
Do you have any ETA on a fix? I'd like to push presto-yarn to production, but need this to work first. I'd be happy to help test.
Thanks,
Rob
There is a testcase for a similar scenario here - https://github.com/prestodb/presto-yarn/blob/master/presto-yarn-test/src/main/java/com/teradata/presto/yarn/test/PrestoClusterTest.java#L149 . @kokosing how do you think this test case differs from the hive-serde plugin?
I do not see anything which could make a difference. It is hard to tell anything from the above information. The best would be to try to reproduce it. Though I don't have much time to take care of it. I will talk about it within a team to see if it is possible to take it into a sprint.
Thanks @kokosing. Were you able to hand this off to another team?
Not yet, it is not my responsibility to assign tasks to teams.
CC: @ilfrin @mattsfuller what should we do with this issue?
We're not going to be doing much additional work in the near future. We're focused on other aspects of Presto at the moment. We welcome contributions though.
Does that mean you're not doing much more work on presto-yarn in general? If this is a dieing project I'll just focus on the TD fork, for my enterprise
Hi @rja1 It's not a dieing project and we do welcome outside contributions. It's just not our focus right now.
Given our priorities we are focusing more on core Presto right now. So currently, we aren't advancing this in terms of features. It's more in maintenance mode for smallish stuff. Or if something is terribly broken. Eventually we will start investing more effort here. Just not right now.
Got it, thanks @mattsfuller .