seatunnel
seatunnel copied to clipboard
[Improve][HTTP Connector] Add specified field function for all HTTP connector
Search before asking
- [X] I had searched in the feature and found no similar feature requirement.
Description
So far, some http requests return data that cannot be parsed, such as array<Object> nested type data.
We need to implement the function to specify a field, such as users
in the figure above, so that we can configure the schema for users
.
Usage Scenario
No response
Related issues
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Please describe in detail your requirements.
How to specify the field? What's the parameter name of it? Can you describe the scope of influence of this design? What modifications do we need to make?
I have some questions:
- As you shown, add a new parameter to tell connector get a part of upstream data. But this requirement only works on json format, for text format, how to deal with this feature?
- If user want to get more than one parts of upstream data, this feature how to work?
- If the part that user want to get is only a string, a integer, a double etc... not a list, or the every item in list also contains objects array, schema how to work on data?
I have some questions:
- As you shown, add a new parameter to tell connector get a part of upstream data. But this requirement only works on json format, for text format, how to deal with this feature?
- If user want to get more than one parts of upstream data, this feature how to work?
- If the part that user want to get is only a string, a integer, a double etc... not a list, or the every item in list also contains objects array, schema how to work on data?
I understand the problem you pointed out, but there is no good plan at present. I suggest that we discuss it together at the next weekly meeting.
I have some questions:
- As you shown, add a new parameter to tell connector get a part of upstream data. But this requirement only works on json format, for text format, how to deal with this feature?
- If user want to get more than one parts of upstream data, this feature how to work?
- If the part that user want to get is only a string, a integer, a double etc... not a list, or the every item in list also contains objects array, schema how to work on data?
- I think the most of SaaS API will return data use json format. It is rare to use the text format. If it appears, we will deal with it separately.
- In fact, each
content field
to be read is equivalent to a table. I this case theuser
can be see a read table, and we can config theschema
about the it. Now SeaTunnel only support read one table data in one connector, so we only support config onecontent field
once. If we support reading data from multiple tables in one connector in the future, we can also support defining multiplecontent field
in one connector. - User can use
schema
to define the schema ofcontent field
. If use didn't config schema, We can think that users only want to read basic data such as string/integer/long and the column name same ascontent field
.
I suggest we only support basic array type in the case. This is a example.
{
"xxx":"xxx",
"users":[
{
"id":1,
"name":"n1",
"int_list": [1,2,3],
"json_list":[{"n1":"v1", "n2":"v2"}]
}
]
}
schema: {
id: int,
name: string,
int_list: array<int>,
json_lsit: array<string>
}
Hi, I think we should focus on the data we need. json-path can extract the data we need and block out the unnecessary data. This can reduce our workload of configuring the schema #3510 please see this pr @TaoZex
Hi, I think we should focus on the data we need. json-path can extract the data we need and block out the unnecessary data. This can reduce our workload of configuring the schema #3510 please see this pr @TaoZex
Thanks.
Hi, I think we should focus on the data we need. json-path can extract the data we need and block out the unnecessary data. This can reduce our workload of configuring the schema #3510 please see this pr @TaoZex
json-path
is a good way to read irregular json node from a json. However, json-path
requires users to understand regular expressions and make a lot of configurations. If there are many json nodes to read, this method is not friendly. For those who only need to read the data of a certain json node and its child nodes, the content-field
method will be more friendly and simple.
Hi, I think we should focus on the data we need. json-path can extract the data we need and block out the unnecessary data. This can reduce our workload of configuring the schema #3510 please see this pr @TaoZex
json-path
is a good way to read irregular json node from a json. However,json-path
requires users to understand regular expressions and make a lot of configurations. If there are many json nodes to read, this method is not friendly. For those who only need to read the data of a certain json node and its child nodes, thecontent-field
method will be more friendly and simple.
1.Regarding the difficulty of using,we can use publicly available tools to help parse like https://jsonpath.com/
2.Regarding getting some json nodes, I think this solution can also be done. We can configure the expression of the node.
Hi, I think we should focus on the data we need. json-path can extract the data we need and block out the unnecessary data. This can reduce our workload of configuring the schema #3510 please see this pr @TaoZex
json-path
is a good way to read irregular json node from a json. However,json-path
requires users to understand regular expressions and make a lot of configurations. If there are many json nodes to read, this method is not friendly. For those who only need to read the data of a certain json node and its child nodes, thecontent-field
method will be more friendly and simple.1.Regarding the difficulty of using,we can use publicly available tools to help parse like https://jsonpath.com/ 2.Regarding getting some json nodes, I think this solution can also be done. We can configure the expression of the node.
Regarding the second point, it can be completed in the next step.
Hi, I think we should focus on the data we need. json-path can extract the data we need and block out the unnecessary data. This can reduce our workload of configuring the schema #3510 please see this pr @TaoZex
json-path
is a good way to read irregular json node from a json. However,json-path
requires users to understand regular expressions and make a lot of configurations. If there are many json nodes to read, this method is not friendly. For those who only need to read the data of a certain json node and its child nodes, thecontent-field
method will be more friendly and simple.1.Regarding the difficulty of using,we can use publicly available tools to help parse like https://jsonpath.com/ 2.Regarding getting some json nodes, I think this solution can also be done. We can configure the expression of the node.
Regarding the second point, it can be completed in the next step.
Thanks, jsonpath
is a good tools, we can use it. Another question is how to let connector to know $.phoneNumbers
is read as a string or a table have columns type, number
?
This needs to be parsed through the schema, json-path does not need to care about the returned type, it is only responsible for simplifying the returned data.In addition, we can get the fields in phoneNumbers.for example:
source {
Http {
url = "http://mockserver:1080/jsonpath/mock"
method = "GET"
format = "json"
json_field = {
type = $.phoneNumbers[*].type
number = $.phoneNumbers[*].number
}
schema = {
fields {
type = string
number = string
}
}
}
}
This needs to be parsed through the schema, json-path does not need to care about the returned type, it is only responsible for simplifying the returned data.In addition, we can get the fields in phoneNumbers.for example:
source { Http { url = "http://mockserver:1080/jsonpath/mock" method = "GET" format = "json" json_field = { type = $.phoneNumbers[*].type number = $.phoneNumbers[*].number } schema = { fields { type = string number = string } } } }
Configure the type by schema.
$.phoneNumbers
How can we config when we want to read $.phoneNumbers
as a table?
json_field= {
$.phoneNumbers
}
Hi, @liugddx Can you sync the result of this discuss here? And sync to the email list is better.
Hi, @liugddx Can you sync the result of this discuss here? And sync to the email list is better.
After discussion,There are now two solutions
- Provide flexible json-path placeholder configuration
json_field [Config]
.This parameter helps you configure the schema,so this parameter must be used with schema.If your data looks something like this:
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}
You can get the contents of 'book' by configuring the task as follows:
source {
Http {
url = "http://mockserver:1080/jsonpath/mock"
method = "GET"
format = "json"
json_field = {
category = "$.store.book[*].category"
author = "$.store.book[*].author"
title = "$.store.book[*].title"
price = "$.store.book[*].price"
}
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}
}
- Provides the ability to get partial json.
content_json [String]
This parameter can get some json data.If you only need the data in the 'book' section, configurecontent_field = "$.store.book.*
.If your return data looks something like this.
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}
You can configure content_field = "$.store.book.*" and the result returned looks like this:
[
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
]
Then you can get the desired result with a simpler schema,like
Http {
url = "http://mockserver:1080/contentjson/mock"
method = "GET"
format = "json"
content_field = "$.store.book.*"
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}
Closed by https://github.com/apache/incubator-seatunnel/issues/3500