爪哇蜂 comments

Results 5 comments of


                                            爪哇蜂

CSV文件字段存在换行符，如何正常读写存储到hive表？

第一个问题比较好解决：方法1: 使用spark ```scala ides> spark.read.option("multiline",true).option("header", true).csv("file:///Users/sgr/test").show ``` ![image](https://user-images.githubusercontent.com/20768390/100715683-80be7600-33f2-11eb-9311-487718a7994b.png) 方法2: 使用ides ```scala ides> load csv.`file:///Users/sgr/test` where multiline='true' and header='true' as tb; | tb.show ``` ![image](https://user-images.githubusercontent.com/20768390/100715832-bfecc700-33f2-11eb-9247-9571e4fbec7d.png) > 两种方式都能解决读取存在换行的问题，关键是通过指定参数`multiline='true'` > ides语法可以将csv数据`as tb`保存成表，进行后续使用。...

CSV文件字段存在换行符，如何正常读写存储到hive表？

## 💥 对于这个问题，在ides得到了很好得解决： ### 我们先模拟一张字段带有换行符的表 `multiline_csv_data` ```sql select "1" as id, "文本存在一个换行符'\n'" as text union all select "2" as id, "文本存在多个换行符'\n\n'" as text as multiline_csv_data; ``` ![image](https://user-images.githubusercontent.com/20768390/102007086-f1558300-3d60-11eb-8b00-63f74c818b97.png) ### 保存到hive中的表`test.multiline_csv_data` ```sql...

脚本支持多语言（python、sql）模式开发

上面的需求需要在一门脚本语言上支持多种语言开发。我们可以[参考`Zeppelin`](http://zeppelin.apache.org/docs/0.8.0/usage/interpreter/overview.html)的做法，使用不同的`Interpreter`，大概是这样： ```python %python a=1 print(1) % ``` 对于支持jdbc协议（`mysql`/`kylin`）的sql像这样： ```sql %sql(test) select 1 from test; % > output ``` **值得关注** 我们和`Zeppelin`有少许差异： 1. 末尾需要用`%`标识代码结束 2. 如果执行器需要特殊输入可以通过`()`引用。如python需要对a表做处理:`%python(a)`；sql需要使用test连接:`%sql(test)`。这和`Zeppelin`的[`Kylin`解释器](http://zeppelin.apache.org/docs/0.8.0/interpreter/kylin.html)类似。 3. 如果代码执行完需要输出可以用`> output`指定

脚本支持多语言（python、sql）模式开发

我们通过antlr lexical modes找到解决方法，antler的词法分析模式允许在同份文件中包含多重语言。通过一些特殊的"哨兵"字符序列，执行不同模式的切换，[参考文档](https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md)。官方提供了`xml`语言实现`lexical modes`的例子，[参考代码](https://github.com/antlr/grammars-v4/blob/master/xml/XMLLexer.g4)。我们的实现差不多像这样：定义`IdesLexer.g4`文件： ```java lexer grammar IdesLexer; ... PY_MODE : '%python' -> pushMode(PYTHON_LAN); SQL_MODE : '%sql' -> pushMode(SQL_LAN); ... mode PYTHON_LAN; EXIT_PY : '%' -> popMode;...

CSV文件字段存在换行符，如何正常读写存储到hive表？

你用的啥在处理，有示例文件吗