cobrix
cobrix copied to clipboard
Retaining string for 9(...) picture
Sometimes pictures like 9(8) are actually not referring to integer values. For example, many mainframe files use 9(8) to represent a date. This means that the value should not be converted to integer automatically. I'd like to have a feature where all 9(...) fields are not automatically converted to integer but treated as string
It's an interesting idea. I'd ask a couple of clarification questions first.
Do I understand correctly that you'd like an option to retain values as strings for specific pics?
E.g. .option("retain_string_for_pic", "9(8)").
Is it possible that some PICs that look like that do need to be converted to integers and some don't?
We'd like to understand the motivation behind the requested feature some more to understand it's significance and usaility.
- All integer values are converted to Spark Dataframe fields without any loss of the information so these fields can always be converted [back] to strings using a Spark transformation (e.g.
df.withColumn("columnB", $"columnA".cast(StringType))). Do I understand it correctly that this is not a very good option since it requires a lot of manual select transformation and the case is very common? - Alternatively, the same effect could be achieved by tweaking the copybook replacing '9(8)' with 'X(8)' for the specified fields. Do I understand correctly that tweaking a copybook is even worse idea than post-conversion transformation because this adds an error-prone manual step for each copybook?
Performance-wise there is almost no gain (if any) if Cobrix will output such fields as strings.
Would your use case is solved better with this kind of option:
a) .option("retain_string_for_pic", "9(8)"),
or
b) .option("retain_string_for_fields", "date_created,date_modified,other_date_field")?
I suppose we can add an option like this:
.option("retain_string_for_display_pic", true)
It would retain string types for all numbers that have DISPLAY format. We can easily implement this.
@pelatimtt, what do you think?
Hi @yruslan,
any plan for implementing this .option("retain_string_for_display_pic", true)?
There are some open questions regarding this feature. While for integers like this PIC 9(10) it makes sense, it is not clear what should be done for decimals PIC 9(4)V99. Should Cobrix add the decimal point or left the original number as is. Different use cases might have different preferences.
Currently, just replacing 9(...) with X(...) in the copybook achieves the same result. So not 100% sure this is very helpful to be as part of Cobrix.
hi @yruslan,
as the copybooks( and data) come from third party , we dont generally change the copybook. adding this option would be adding help to us and other users.
also, about " for decimals PIC 9(4)V99. Should Cobrix add the decimal point or left the original number as is". --- how about having both possibilities based on option ("add_assumed_decimal", "true/false")
There is a 3rd possibility - leave decimals as natural Spark decimals, but make integrals strings. The more possibilities there are, the more complicated Cobrix becomes to use, and the more work to implement it. And all of this is possible to achieve by just changing the copybook.
i agree that changing copybook can solve the problem, but we cannot change in our case as they are from third party (including data) and we dont have control on this copybooks
can you help on giving example or elaborating this "There is a 3rd possibility - leave decimals as natural Spark decimals, but make integrals strings."