steampipe-postgres-fdw
steampipe-postgres-fdw copied to clipboard
JSON data with \u0000 errors in Postgres with "unsupported Unicode escape sequence"
Describe the bug
Running this query
select jsonb_pretty(attributes) from ldap_user where object_sid = 'S-1-5-21-4049728555-799940961-2503284476-7632';
Seems to return some JSON data with \u0000, leading to the error:
select jsonb_pretty(attributes) from ldap_user where object_sid = 'S-1-5-21-4049728555-799940961-2503284476-7632';
Error: unsupported Unicode escape sequence
Error: syntax error at or near "Error"
Steampipe version (steampipe -v)
Example: v0.9.1
logs
2021-11-17 16:14:56.341 GMT [2646] ERROR: unsupported Unicode escape sequence
2021-11-17 16:14:56.341 GMT [2646] DETAIL: \u0000 cannot be converted to text.
2021-11-17 16:14:56.341 GMT [2646] CONTEXT: JSON data, line 1: ...on","organizationalPerson","user"],"objectGUID":[...
2021-11-17 16:14:56.341 GMT [2646] STATEMENT: select cn, object_sid from ldap_user where object_sid = 'S-1-5-21-4049728555-799940961-2503284476-7632'
Can refer here Data which causing this error
Expected behavior
I'd expect the JSON column handler to either strip the null field \u0000 out to fix the parsing or to replace the JSON with error content.
Additional context
N/A
branch to reproduce issue https://github.com/turbot/steampipe-plugin-chaos/compare/json_issue?diff=unified
> select * from chaos_json_breaking
Error: unsupported Unicode escape sequence
@binaek Any updates on this? I believe we're hitting up against it still in the LDAP plugin, e.g., https://github.com/turbot/steampipe-plugin-ldap/issues/3
Hmm we'll need to dig into this - there was a fix which covered at least some cases but not all it seems
@kaidaguerre Any update on this? Thanks!
Accidentally closed, re-opening
There is code in the FDW which attempts to strip out non-escaped unicode nulls, but to leave in place escaped unicode nulls
The original code was:
func jsonValueString(val interface{}, valueString string) (string, error) {
bytes, err := json.Marshal(val)
if err != nil {
return "", err
}
// remove invalid unicode characters
valueString = strings.Replace(string(bytes), `\u0000`, "", -1)
return valueString, nil
}
However in response to this issue this code was changed to try to leave nulls which were escaped, as follows:
func jsonValueString(val interface{}) (string, error) {
jsonBytes, err := json.Marshal(val)
if err != nil {
return "", err
}
valueString := string(jsonBytes)
// remove unicode null char "\u0000", UNLESS escaped, i.e."\\u0000"
if strings.Contains(valueString, `\u0000`) {
re := regexp.MustCompile(`(?:(\\\\u0000)|([^\\]?)\\u0000)`)
valueString = re.ReplaceAllString(valueString, "$1$2")
}
return valueString, nil
}
This error is caused by this JSON field:
"objectGUID":[
"��g��\\\u0000K���\u0001m:�"
],
Current version of code (trying to handle escaped null) modifies this to:
"objectGUID":[
"��g��\\\u0000K���\u0001m:�"
],
which fails.
The old code (replacing all unicode nulls) keeps this as:
"objectGUID":[
"��g��\\K���\u0001m:�"
],
which works.
So it looks like the regex needs updating to handle this case
@kaidaguerre - A user, christianherweg0807, is reporting he's still seeing issues - https://github.com/turbot/steampipe-plugin-ldap/issues/3#issuecomment-1161821468. The LDAP plugin's latest version is built off of Steampipe SDK v3.1.0.
Is there something else we should look into from the plugin side, or is more information needed to debug this issue? Thanks!
Hi, sorry for the delay...
$ steampipe -v
steampipe version 0.14.6
Here an example with latest ldap plugin running against an Microsoft AD LDAP:
> select mail from ldap_user where department like 'MY-DE-PA-RT-MENT';
**Error: unsupported Unicode escape sequence (SQLSTATE 22P05)**
> select mail from ldap_user where department = 'MY-DE-PA-RT-MENT';
+-----------------------------+
| mail |
+-----------------------------+
| [email protected] |
Still seeing this. LDAP server is AD.
> select dn from ldap_user;
Error: unsupported Unicode escape sequence (SQLSTATE 22P05)
(about a dozen results are returned, omitted them)
CentOS 7.9
$ steampipe plugin list
+--------------------------------------------------+---------+-------------+
| Installed Plugin | Version | Connections |
+--------------------------------------------------+---------+-------------+
| hub.steampipe.io/plugins/turbot/ldap@latest | 0.2.0 | ldap |
| hub.steampipe.io/plugins/turbot/steampipe@latest | 0.6.0 | steampipe |
+--------------------------------------------------+---------+-------------+
$ steampipe -v
steampipe version 0.17.4
There is a link to an old slack thread in the #3 but it appears to be outside the retention window for messages.
I faced the same issue when using the LDAP plugin, and came up with the following changes to the FDW component:
diff --git a/helpers.go b/helpers.go
index 4287e5f..cfb8d32 100644
--- a/helpers.go
+++ b/helpers.go
@@ -13,7 +13,6 @@ import (
"fmt"
"log"
"os"
- "regexp"
"strings"
"time"
"unsafe"
@@ -194,10 +193,9 @@ func jsonValueString(val interface{}) (string, error) {
valueString := string(jsonBytes)
// remove unicode null char "\u0000", UNLESS escaped, i.e."\\u0000"
- if strings.Contains(valueString, `\u0000`) {
- log.Printf("[TRACE] null unicode character detected in JSON value - removing if not escaped")
- re := regexp.MustCompile(`((?:^|[^\\])(?:\\\\)*)(?:\\u0000)+`)
- valueString = re.ReplaceAllString(valueString, "$1")
+ if strings.Contains(valueString, "\u0000") {
+ log.Printf("[TRACE] non escaped null unicode character detected in JSON value - removing")
+ valueString = strings.ReplaceAll(valueString, "\u0000", "$1")
}
return valueString, nil
Edit: we probably want to keep the regex replace, I only gave attention to the comment that said removing, and did not realize it was intended to escape the nulls. An alternative may be to base64 encode the whole string if it contains nulls as does ldapsearch by default.
There's no need to check whether the null unicode character is escaped or not as the following:
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Contains("Hello \u0000", "\u0000"))
fmt.Println(strings.Contains("Hello \u0000", `\u0000`))
fmt.Println(strings.Contains("Hello \\u0000", "\u0000"))
fmt.Println(strings.Contains("Hello \\u0000", `\u0000`))
}
returns:
true
false
false
true
https://go.dev/play/p/8RjKYbSUNOC
And I also applied these changes to the LDAP plugin to drop values for attributes like jpegPhoto, thumbnailPhoto, etc. that have low interest from steampipe point of view IMO and frequently contain null unicode character:
diff --git a/ldap/utils.go b/ldap/utils.go
index 2689f4a..18c3d3b 100644
--- a/ldap/utils.go
+++ b/ldap/utils.go
@@ -5,6 +5,7 @@ import (
"crypto/tls"
"errors"
"fmt"
+ "log"
"strings"
"time"
@@ -274,7 +275,19 @@ func convertToTimestamp(ctx context.Context, str string) *time.Time {
func transformAttributes(ctx context.Context, attributes []*ldap.EntryAttribute) map[string][]string {
var data = make(map[string][]string)
for _, attribute := range attributes {
- data[attribute.Name] = attribute.Values
+ found_null := false
+ for _, value := range attribute.Values {
+ if strings.Contains(value, "\u0000") {
+ found_null = true
+ break
+ }
+ }
+ if found_null {
+ log.Printf("[TRACE] dropping %q attribute value as it contains unicode nulls", attribute.Name)
+ data[attribute.Name] = []string{"dropped"}
+ } else {
+ data[attribute.Name] = attribute.Values
+ }
}
return data
}
@kaidaguerre would you accept PRs proposing such changes?
any updates on this, I met the same issue?
Sorry for the radio silence on this one
@pdecat @kaidaguerre would you accept PRs proposing such changes?
Yes, certainly open to pull requests!
hello, any news?
I still have the error (I have just udated steampipe and ldap plugin to teh latest releases)
select a.mail
from ldap_user a
where dn in (
select
jsonb_array_elements_text(attributes -> 'manager')
from adgroup.ldap_user where sam_account_name='matteo');
Error: unsupported Unicode escape sequence (SQLSTATE 22P05)
Bumping for updates. Still a problem with the very latest (as of this morning) Steampipe and ldap plugin.
> select u.cn, u.dn, u.display_name from ldap_user u order by u.display_name;
Error: unsupported Unicode escape sequence (SQLSTATE 22P05)
+----+----+--------------+
| cn | dn | display_name |
+----+----+--------------+
+----+----+--------------+