cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

string to varchar with length

Open anilpanicker opened this issue 3 years ago • 1 comments

Background [Optional]

A clear explanation of the reason for raising the question. This gives us a better understanding of your use cases and how we might accommodate them.

Question

we want to write the dataframe to SQL server, the dataframe has string datatype where we want to change the type to varchar with correct length. Is there a way to get fieldName, dataType and length from the copyBook?

anilpanicker avatar Sep 22 '22 23:09 anilpanicker

Hi, you can get lengths and other parameters from an AST generated by parsing a copybook using CopybookParser.parseSimple(copyBookContents).

Example: https://github.com/AbsaOSS/cobrix#spark-sql-schema-extraction

When invoking parseSimple() you get an AST that you can traverse and read field lengths and other field properties.

yruslan avatar Sep 23 '22 13:09 yruslan

ok, thanks let me try

anilpanicker avatar Sep 24 '22 12:09 anilpanicker

I'm also thinking of adding a metadata field to the generated Spark schema that will contain maximum lengths of string fields, so converting this question to a feature request.

yruslan avatar Sep 26 '22 07:09 yruslan

Thanks, Ruslan, the same idea came to my mind as well. Our use case is to load the data to RDBMS, currently, all strings default to max length (nvarchar). If we have lengths available we can add an option like this: df.write.format("JDBC").option("createTableColumnTypes","ProductID Int, ProductName nvarchar(100) )

anilpanicker avatar Sep 26 '22 11:09 anilpanicker

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.

yruslan avatar Oct 10 '22 07:10 yruslan

Thanks for the quick turnaround. Will check it out.

On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko @.***> wrote:

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.

— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/517#issuecomment-1272878987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI . You are receiving this because you authored the thread.Message ID: @.***>

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!!

  • Francis Sullivan

anilpanicker avatar Oct 10 '22 10:10 anilpanicker

Hi Ruslan, Another question: we have a data file with x length (x > 90), but I want to parse only the first 90 bytes, is it possible with the current approach? I tried with record_length option but it did not work. Let me please know your thoughts.

On Mon, Oct 10, 2022 at 6:08 AM Anil Ramapanicker @.***> wrote:

Thanks for the quick turnaround. Will check it out.

On Mon, Oct 10, 2022 at 3:18 AM Ruslan Yushchenko < @.***> wrote:

The new metadata field ('maxLength') for each Spark schema column is now available in the 'master' branch. Here are details on this: https://github.com/AbsaOSS/cobrix#spark-schema-metadata You can try it out by cloning master and building from source, or you can wait for the release of Cobrix 2.6.0, which should be soon.

— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/cobrix/issues/517#issuecomment-1272878987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7NTXCPBAJNPFB6LMFTJUDWCO7NRANCNFSM6AAAAAAQTQ4HAI . You are receiving this because you authored the thread.Message ID: @.***>

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!!

  • Francis Sullivan

-- Anil Ramapanicker 148 Stony Brook Road Fishkill, New York 12524 Home: +1 845-440-6496 Cell: +1 914-826-7646 Great Algorithms are poetry of computation!!!

  • Francis Sullivan

anilpanicker avatar Oct 14 '22 23:10 anilpanicker