cjworkbench
cjworkbench copied to clipboard
Feature request: add "no inference mode" to add from URL
Hi, when I add from URL a file, workbenchdata does inferencing to map the field types. It's a great feature but sometimes gives wrong results.
In example here (https://app.workbenchdata.com/workflows/17120) I import an XLS file and it maps the field "CODISTAT" as number and it's a problem, because in the source xls file it's a text field. And then in workbenchdata the value "001801" becomes "1801" and it's not so good.
It would be great to have an option in the module to have "no inference", and have all fields as text field.
Thank you
Moreover if (it's in tab2 ) I apply CODISTAT.rjust(6,"0") python function, I have wrong result: once again "1801" and not "001801", because the output field type is a number.
Hi Andrea, Thanks for reporting the issue, it's on our roadmap. We'll make sure to let you know when it's fixed.You can also report bugs through Intercom if that's easier. Thank you!
On Sun, May 5, 2019 9:17 AM, Andrea Borruso [email protected] wrote: Moreover if (it's in tab2 ) I apply CODISTAT.rjust(6,"0") python function, I have wrong result: once again "1801" and not "001801", because the output field type is a number.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
-Pierre ContiCo-founder & CEOWorkbench@pierreconti
Hi @pierreconti I think it could be more useful to write here about feature requests and bugs. I think this avoids duplication. Then I wait with great interest, because in some cases it becomes uncomfortable.
Thank you
@aborruso I've seen something similar before. My workaround, using the Python module:
def process(table):
table['Zip'] = table['Zip'].astype(str).str.zfill(5)
return table
@adamhooper I will use it waiting for an official "solution".
Thank you
I deployed new fetch logic that stores raw files. And our new CSV parser backend has this option ... but we don't expose it to users.
Now, the missing pieces are:
- New XLS/XLSX parsers in https://github.com/CJWorkbench/arrow-tools. I envision a "strict-types mode" in which the only way we interpret a column as Number/Date is if all values are of that type. (
pd.read_excel()is a lost cause.) - As mentioned in #98, we need a UI so users can set this new option without forcing a fetch.
@adamhooper thank you, it's a good thing!