gojay
gojay copied to clipboard
Is there a way to get the raw bytes with the decoder + unmarshal interface
I have a complex data structure, for which I have all the interfaces defined to satisfy the UnmarshalerJSONObject
interface. There is one object within this complex structure which is currently being handled by the standard lib UnmarshalJSON
interface, where I mutated/normalize the bytes before unmarshaling. I cannot use UnmarshalerJSONObject
for this object because UnmarshalJSONObject(dec *gojay.Decoder, k string)
does not expose the bytes
I need a way to see if the value for the k is a string or []string, if it's a string I need to change the byte structure to be a []string
Example of the UnmarshalJSON I use which does not work well with UnmarshalerJSONObject
func (a *Foo) UnmarshalJSON(b []byte) error {
var rawData map[string]interface{}
err := json.Unmarshal(b, &rawData)
if err != nil {
return nil
}
writeJSON := func(buf *bytes.Buffer, key string, value []interface{}) {
buf.WriteString(`"` + key + `":[`)
for i, v := range value {
buf.WriteString(`"` + v.(string) + `"`)
if i+1 < len(value) {
buf.WriteByte(',')
}
}
buf.WriteString(`]`)
}
// allocate the buffer upfront
buf := bytes.NewBuffer(make([]byte, 0, len(b)))
buf.WriteByte('{')
i, keysN := 1, len(rawData)
for key, value := range rawData {
switch rawData[key].(type) {
case []interface{}:
writeJSON(buf, key, value.([]interface{}))
case string:
// handle the case where the SDK sends seperated values
parts := strings.Split(value.(string), ",")
if len(parts) == 1 && len(parts[0]) == 0 {
parts = []string{}
}
// create an interface slice for the method, for the most part this will always be a slice of 1
slice := make([]interface{}, len(parts))
for i := 0; i < len(parts); i++ {
slice[i] = parts[i]
}
writeJSON(buf, key, slice)
}
if i < keysN {
buf.WriteByte(',')
i++
}
}
buf.WriteByte('}')
// avoid infinite recursion, create a type alias
type temp Foo
var tempFoo temp
err = json.Unmarshal(buf.Bytes(), &tempFoo)
if err != nil {
return nil
}
// mutate a
*a = Foo(tempFoo)
return nil
}
^ oh as an FYI on this example, nil error returns are on purpose. If this object fails to unmarshal, it shouldn't break all the other objects in the complex structure this belongs to.
Within the UnmarshalerJSONObject
I use the dec.Array
as the data structure foo contains fields that are all of type []string
However, the value of the data can either be a single string, or a comma separated string, or an array. My custom unmarshaler handles all those permutations and ensures everything is of type []string to avoid a structure where the value is of type interface{}.
within the context of (s *Foo) UnmarshalJSONObject(dec *gojay.Decoder, k string)
each value is defined to spec
switch k {
case "bar":
var aSlice = Strings{}
err := dec.Array(&aSlice)
if err == nil && len(aSlice) > 0 {
s.Bar = []string(aSlice)
}
return err
....
}
however dec.Array(&aSlice)
doesn't allow there to be the chance that the data is of type string.
I've tried calling dec.String()
first and then following back if err != nil
to dec.Array()
, but calling String() moves the reader forward and skips "bad" data, therefore calling dec.Array()
after fails. Calling dec.Array() on a string type also fails with a non catchable error invalidUnmarshalErrorMsg
, which is not bubbled up to err := dec.Array(&aSlice)
, which means one can't simply call dec.String()
after. And because I haven't found a way to work with the bytes or call UnmarshalJSON
within UnmarshalJSONObject I can't get the performance boost from calling
decoder := gojay.BorrowDecoder(reader)
defer decoder.Release()
err = decoder.DecodeObject(&v)
Because the data will be invalid for object Foo as a result of not being able to handle string
types.
That also means I don't gain a real performance boost when calling
decoder := gojay.BorrowDecoder(reader)
defer decoder.Release()
err = decoder.Decode(&v)
Which uses the std lib as the the code always hits the case
case *interface{}:
err = dec.decodeInterface(vt)
which uses the std lib underneath
if err = json.Unmarshal(object, i); err != nil {
return err
}
Any thoughts?
I was just going to post the same issue when I saw yours. I am trying to migrate my code from JSON Iterator to GoJay except in my case all of the objects are sent in the way you describe. My objects often have a "type" field and a "content" field. I am unmarshaling only the "type" field then decide which further processing is required, then I am sending the bytes of the "content" filed to the proper code that handles this type of messages. I don't understand how to do it in GoJay without sending the entire original JSON data to each function.
Hey, sorry for the latency!
So in the end, what you want to do is to unmarshal a value that could be either a json array of strings or a comma separated string into a go []string
.
With the current state of gojay, I only see one solution, first unmarshal that value to a gojay.EmbeddedJSON
, then check if first char is [
or "
and then do the unmarshaling accordingly.
Example:
func (f *Foo) UnmarshalJSONObject(dec *gojay.Decoder, k string) error {
switch k {
case "yourkey":
eb := make(gojay.EmbeddedJSON, 0, 128)
if err := dec.EmbeddedJSON(&eb); err != nil {
return err
}
switch eb[0] {
case '"':
// decode string, then split it
var s string
if err := gojay.Unmarshal(eb, &s); err != nil {
return err
}
f.V = strings.Split(s, ",")
case '[':
// decode array
var aSlice = Strings{}
err := gojay.Unmarshal(eb, &aSlice)
if err == nil && len(aSlice) > 0 {
s.Bar = []string(aSlice)
}
return err
}
}
}
We could also add some methods to the decoder to tell what's the next data in the buffer. Something like:
switch dec.NextToken() {
case gojay.TokenArray:
case gojay.TokenString:
}
Let me know what you think
What I actually want is not to unmarshal a field but leave it as bytes (or other internal type). Then send only this field to another decoder separately.
Here is my current code (there is another issue there that I unmarshal twice, I am going to fix it)
header := DataLayer.MessageHeader{}
var data map[string]jsoniter.RawMessage
err := Json.JsonPaser.Unmarshal(message, &data)
if err != nil {
Log.Error(err)
return nil, nil
}
err = Json.JsonPaser.Unmarshal(message, &header)
if err != nil {
Log.Error(err)
return nil, nil
}
return &header, data["content"]
The data is something like :
{
"messageId":1,
"sender" : "sender name",
"type": "order",
"content":{
//order details
}
}
I can unmarshal just the first 3 fields and according to the type I can send the content bytes to the order handling function that will unmarshal it separately.
This is how I do it... Not saying this is the best way but I cannot easily switch to GoJay because I cannot do this with GoJay.
For my situation, this may work very well! Going to give it a try today. I justed needed a way to normalize the data types to [], which you saw I was doing with raw bytes before.
On Wed, Jul 17, 2019, 12:47 AM Francois Parquet [email protected] wrote:
Hey, sorry for the latency!
So in the end, what you want to do is to unmarshal a value that could be either a json array of strings or a comma separated string into a go []string.
With the current state of gojay, I only see one solution, first unmarshal that value to a gojay.EmbeddedJSON, then check if first char is [ or " and then do the unmarshaling accordingly.
Example:
func (f *Foo) UnmarshalJSONObject(dec *gojay.Decoder, k string) error { switch k { case "yourkey": eb := make(gojay.EmbeddedJSON, 0, 128) if err := dec.EmbeddedJSON(&eb); err != nil { return err } switch eb[0] { case '"': // decode string, then split it var s string if err := gojay.Unmarshal(eb, &s); err != nil { return err } f.V = strings.Split(s, ",") case '[': // decode array var aSlice = Strings{} err := gojay.Unmarshal(eb, &aSlice) if err == nil && len(aSlice) > 0 { s.Bar = []string(aSlice) } return err } } }
We could also add some methods to the decoder to tell what's the next data in the buffer. Something like:
switch dec.NextToken() { case gojay.TokenArray: case gojay.TokenString: }
Let me know what you think
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/francoispqt/gojay/issues/119?email_source=notifications&email_token=AB4Z3XG3VM2YKA3A3FFMTULP72P4HA5CNFSM4H7AUZYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DAPTI#issuecomment-512100301, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4Z3XCO3N2NV4J66EL6GJTP72P4HANCNFSM4H7AUZYA .
@francoispqt given that the data can be very dynamic in terms of size, the first solution can be very hard to use
eb := make(gojay.EmbeddedJSON, 0, 128)
// 128 best guess?
If the string or array has an unknown size. It can be very easy to under or over allocate. I really like the idea of adding,
switch dec.NextToken() {
case gojay.TokenArray:
case gojay.TokenString:
}
it keeps it very simple without adding too much complexity, example below. What do you think?
switch dec.NextToken() {
case gojay.TokenArray:
var aSlice = Strings{}
dec.Array(&aSlice)
s.Bar = []string(aSlice)
case gojay.TokenString:
var s string
dec.String(&s)
aSlice := string.Split(s, ",")
dec.Array(&aSlice)
s.Bar = []string(aSlice)
}
because then we could just use the underline logic of
dec.Array(&aSlice)
@francoispqt Was wondering if anymore thought was given to
switch dec.NextToken() {
case gojay.TokenArray:
case gojay.TokenString:
}