cjworkbench Feature request: add "no inference mode" to add from URL

trafficstars

Hi, when I add from URL a file, workbenchdata does inferencing to map the field types. It's a great feature but sometimes gives wrong results.

In example here (https://app.workbenchdata.com/workflows/17120) I import an XLS file and it maps the field "CODISTAT" as number and it's a problem, because in the source xls file it's a text field. And then in workbenchdata the value "001801" becomes "1801" and it's not so good.

It would be great to have an option in the module to have "no inference", and have all fields as text field.

Thank you

May 05 '19 08:05 aborruso

Moreover if (it's in tab2 ) I apply CODISTAT.rjust(6,"0") python function, I have wrong result: once again "1801" and not "001801", because the output field type is a number.

May 05 '19 09:05 aborruso

Hi Andrea, Thanks for reporting the issue, it's on our roadmap. We'll make sure to let you know when it's fixed.You can also report bugs through Intercom if that's easier. Thank you!

On Sun, May 5, 2019 9:17 AM, Andrea Borruso [email protected] wrote: Moreover if (it's in tab2 ) I apply CODISTAT.rjust(6,"0") python function, I have wrong result: once again "1801" and not "001801", because the output field type is a number.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

-Pierre ContiCo-founder & CEOWorkbench@pierreconti

May 06 '19 03:05 pierreconti

Hi @pierreconti I think it could be more useful to write here about feature requests and bugs. I think this avoids duplication. Then I wait with great interest, because in some cases it becomes uncomfortable.

Thank you

May 06 '19 06:05 aborruso

@aborruso I've seen something similar before. My workaround, using the Python module:

def process(table):
    table['Zip'] = table['Zip'].astype(str).str.zfill(5)
    return table

May 16 '19 21:05 adamhooper

@adamhooper I will use it waiting for an official "solution".

Thank you

May 16 '19 21:05 aborruso

I deployed new fetch logic that stores raw files. And our new CSV parser backend has this option ... but we don't expose it to users.

Now, the missing pieces are:

New XLS/XLSX parsers in https://github.com/CJWorkbench/arrow-tools. I envision a "strict-types mode" in which the only way we interpret a column as Number/Date is if all values are of that type. (pd.read_excel() is a lost cause.)
As mentioned in #98, we need a UI so users can set this new option without forcing a fetch.

Dec 13 '19 16:12 adamhooper

@adamhooper thank you, it's a good thing!

Dec 13 '19 17:12 aborruso

cjworkbench cjworkbench copied to clipboard

Feature request: add "no inference mode" to add from URL

cjworkbench
cjworkbench copied to clipboard