python-swat icon indicating copy to clipboard operation
python-swat copied to clipboard

swat.functions.concat truncates char columns with different length

Open gygabyte017 opened this issue 4 years ago • 4 comments

I have for example two CAS Tables T1 T2, which have the exact same columns. I do this:

df1 = CASTable('T1')
df1["Name"] = "AB"
df1.partition(casout=dict(name='T1', replace=True))
df1 = CASTable('T1')

df2 = CASTable('T2')
df2["Name"] = "CDE"
df2.partition(casout=dict(name='T2', replace=True))
df2 = CASTable('T2')

swat.functions.concat([df1,df2], casout=dict(name='T3'))

The T3 table has then a column "Name" which contains "AB" and "CD" i.e. it has a length of 2 and "CDE" is truncated to "CD".

How can I specify an a-priori length for the "Name" column? Like you do in sas code:

length Name $3.;

gygabyte017 avatar Apr 23 '20 14:04 gygabyte017

For something like this, you'll have to use the append_computed_columns method of CASTable. It allows you to specify whatever code you want and the variables that get exported. In your case, it would look like:

df1.append_computed_columns('Name', 'length Name $3.; Name = "AB";')
df2.append_computed_columns('Name', 'length Name $3.; Name = "CDE";')

kesmit13 avatar Apr 23 '20 16:04 kesmit13

SWAT probably should have made character columns like this be varchars instead of fixed with character columns.

kesmit13 avatar Apr 23 '20 18:04 kesmit13

Thank you, very interesting the append_computed_columns method (is it missing in the docs? sometimes I feel like there are some things undocumented).

However I see that if I assing a string constant in that way, the code generated is this:

computedvars=['Name']
computedvarsprogram='Name= "AB"; '

What will be the columns type on CAS in this case?

gygabyte017 avatar Apr 24 '20 10:04 gygabyte017

That might not be in the doc, now that you mention it. It was used as an internal-only method for a while, but then was made public later.

As far as the data type goes, character columns will be fixed-width columns by default. Varchar and varbinary have to be explicitly specified.

kesmit13 avatar Apr 24 '20 15:04 kesmit13