cobrix
cobrix copied to clipboard
string to varchar with length
Background [Optional]
A clear explanation of the reason for raising the question. This gives us a better understanding of your use cases and how we might accommodate them.
Question
we want to write the dataframe to SQL server, the dataframe has string datatype where we want to change the type to varchar with correct length. Is there a way to get fieldName, dataType and length from the copyBook?
Hi, you can get lengths and other parameters from an AST generated by parsing a copybook using CopybookParser.parseSimple(copyBookContents).
Example: https://github.com/AbsaOSS/cobrix#spark-sql-schema-extraction
When invoking parseSimple() you get an AST that you can traverse and read field lengths and other field properties.
ok, thanks let me try
I'm also thinking of adding a metadata field to the generated Spark schema that will contain maximum lengths of string fields, so converting this question to a feature request.
Thanks, Ruslan, the same idea came to my mind as well. Our use case is to load the data to RDBMS, currently, all strings default to max length (nvarchar). If we have lengths available we can add an option like this: df.write.format("JDBC").option("createTableColumnTypes","ProductID Int, ProductName nvarchar(100) )
The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.
Thanks for the quick turnaround. Will check it out.
On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko @.***> wrote:
The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/517#issuecomment-1272878987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI . You are receiving this because you authored the thread.Message ID: @.***>
-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!!
- Francis Sullivan
Hi Ruslan, Another question: we have a data file with x length (x > 90), but I want to parse only the first 90 bytes, is it possible with the current approach? I tried with record_length option but it did not work. Let me please know your thoughts.
On Mon, Oct 10, 2022 at 6:08 AM Anil Ramapanicker @.***> wrote:
Thanks for the quick turnaround. Will check it out.
On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko < @.***> wrote:
The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/517#issuecomment-1272878987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI . You are receiving this because you authored the thread.Message ID: @.***>
-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!!
- Francis Sullivan
-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!!
- Francis Sullivan