Data validation issue - typing text in an integer column
Problem description
When I type text on a numeric cell (integer) content typed disappears, and ODE shows NaN within the cell.
If you then click SAVE, content in the cell is removed:
How this editing process should work
- Upload this csv
- Go to the the column CANTI_GENERO_F
- Add Pepe in row 2, position 4 (first cell of the CANTI_GENERO_F column)
- Click SAVE
- The ODE should show Pepe and add the error to the errors report.
@romicolman we are debating if this is still important given that it will be a difficult change to do. Let's re-sync on this when you are back.
cc @roll @guergana
OK, the tabular data in the client has three sources:
- database
- validation report
- not saved changes (history)
Obviously, in the database, it's only possible to store the data that adheres to column types. It's not possible to store a string value in the integer column.
So what the ODE does with anomalies it takes bad cells from the validation report (produced by frictionless.index) and merge it to the table in the client (+history of changes are merge too).
Theoretically, "text in an integer column" might go to the report as a new validation error (it also needs to be a report patch actually because a new report will be re-created on an index so this data needs to be persisted separately). It will include dealing with _row_valid column manually etc. So it will be a very complex logic on top of the existent already complex logic.
@roll @pdelboca: based on what we discussed yesterday, here are my thoughts:
I understand that Frictionless will try to "force" the user to type data according to the data type. So at least, what I think we should do is this:
- The user enters text content instead of an interger.
- The ODE shows a warning sign (to be defined with @Faithkenny).
- When the user clicks SAVE to update content, data is not updated if entered against rules, but the cell should not be empty.
Do you think this is possible?
Again, I think the proper solution here is to allow the user to enter the data against rules and reflect the new error on the errors report. Since this is not feasible for the stable version we need to make sure to let the user know that the ODE will keep the original content.
If you are ok. I can move this issue to the next sprint so we can work on it with Faith.
cc @guergana
@roll @pdelboca: based on what we discussed yesterday, here are my thoughts:
I understand that Frictionless will try to "force" the user to type data according to the data type. So at least, what I think we should do is this:
- The user enters text content instead of an interger.
- The ODE shows a warning sign (to be defined with @Faithkenny).
- When the user clicks SAVE to update content, data is not updated if entered against rules, but the cell should not be empty.
Do you think this is possible?
Again, I think the proper solution here is to allow the user to enter the data against rules and reflect the new error on the errors report. Since this is not feasible for the stable version we need to make sure to let the user know that the ODE will keep the original content.
If you are ok. I can move this issue to the next sprint so we can work on it with Faith.
cc @guergana
I also agree that the best way is to let the user enter what they prefer and then show it in the report.
I updated the text of the issue and added this:
How the editing process should work
- Upload this csv
- Go to the the column CANTI_GENERO_F
- Add Pepe in row 2, position 4 (first cell of the CANTI_GENERO_F column)
- Click SAVE
- The ODE should show Pepe and add the error to the errors report.
Hi all! For the pre-release we will do this:
- The user uploads this csv
- Goes to the the column CANTI_GENERO_F
- Adds Pepe in row 2, position 4 (first cell of the CANTI_GENERO_F column)
- Once the user finishes editing content and clicks save or in the datagrid blnkspace, the ODE will display a warning message:
"Please, enter the correct data format" or "Data format error"
@romicolman i have been digging in our codebase. Here are some findings (this is an FYI, no action needed on your side):
General Application
We need to define if ODE will:
- Apply validation on Read actions: Reads whatever there is in the database and writes a report on it. Users can write whatever they want.
- Apply validation on Save action: We will use the
schemato apply restrictions on what the user can write and save. And we will show errors. (Note: Save action is when the user clicks on Save, not when editing the grid) - Apply validation both on Read/Write actions: Do both things.
How is it working today
- When the user interacts with the datagrid there is a small validation that if the user is trying to write a text in a number column it will convert it to
NaN. (This causes the behaviour describing in this ticket, this validation is isolated and doesn't make any sense, so we should remove it.) - When the user clicks on
Save, we are usingfrictionless-pyframework to read/convert the user input into a valid value. Since the column is defined as a number, and the user is entering a word. Thenfrictionless-pyreturns an empty value that's being written in the database.
Some Technical Notes
- SQLite (the database we are using) has dynamic typing. So, from a database perspectice, we should be able to write a word in a column defined as numbers: https://www.sqlite.org/faq.html#q3
- I spend a few minutes with our logic and I couldn't make it work. Our
/table/patchendpoint always writes a null if I send a word, no matter. Maybe it is a SQLAlchemy thing? Anyhow, this should be possible, it will just require some time. - Enforcing validation at write time does not makes too much sense to me since, for every error in the user input we will need to generate a message error that will be the same as the report. (So why we don't show the report directly?)
- Enforcing validation when the user is editing the grid will be a nightmare. We will need to integrate a
frictionlessjavascript library to make sure that the validations we apply are the same that we will apply when doing the report. I could expect some headaches here.
Some more Technical Notes
In our current workflow (using frictionless-py to enforce they type of the value being save) we can create some error messages to notify the user that an empty value has been saved when types miss match. I think it is a good way to explain to the user what's happening. Even when it does not solve the experience main issue.
Some extra notes
When the user clicks on Save after editing the grid, ODE calls /table/patch and several things happen.
Related to this issue there are two important ones:
- We use
frictionless-pyto validate if the value matches the schema. If it doesn't it sets it toNone - We update the ODE SQLite table with the new values
- Then we call a bunch of methods to write the values in the file using
resource.write_tablefrom frictionless. a. This will not write in the csv file if the input from the user does not match the schema. - Then something happens with the
reset_record()that deletes our SQLite table. a. I think this is a bug. It doesn't make too much sense to update the SQLite database in steup 2 if now we are deleting it. - At the end of
/table/patchwe have: a. An updated CSV File (if it matches the schema) b. A non-existing SQLite Table
Clearly the initial idea was to write and read using frictionless-py, also /table/patch is more than just a Patch, there is no clear separation of concerns and the method should be refactored.
+1 bumping this, I mentioned this being quite a big problem in the testing session with @romicolman