spark icon indicating copy to clipboard operation
spark copied to clipboard

[BUG]: UDF returning json as string changing the decimal format inside json

Open guruvonline opened this issue 3 years ago • 1 comments

Describe the bug UDF returning json as string, but when reading it in main function, decimal format is getting change

sample json returned by UDF

{
  "Organization":"Org",
  "Employee": 
         {
            "Id":  2345678945.1234512    // this is decimal in json but while reading from UDF, decimal part is not coming correctly
             "name" : "somename"
         }
}
//UDF with signature
string myFunc(string)
{
   return json.serialize(myObject);
}
//myObject is complex object and  looks like
Class myObject
{
   string Organization;
   Employee Emp;
}

class Employee
{
  decimal Id;   //if i change it to string, i get full value
  string name;
}

In my main function i am calling the UDF as

df = df.Select(myFunc(col('col1')).As(jsonResult));

// json schema
schema = {StructField(Organization, StringType),
                  StructFiled(Employee, StringType)}    // reading json as String type

//explode json
df = df.Select(FromJson(Col("jsonResult"), schema).As("Result")) 

df.Select("Result.*").

df.Select("Employee")  // reading json attribute as string, but decimal is getting truncated.

If i change my decimal to string value in json, then i am getting the full decimal value. Which means UDF is returning the correct value, but while reading in main function it is truncating decimal (even though i am reading as string).

Thanks

guruvonline avatar Oct 22 '21 17:10 guruvonline

It might be worth re-testing this with this PR: https://github.com/dotnet/spark/pull/982 as it includes decimal support

GoEddie avatar Nov 09 '21 22:11 GoEddie