XLSX ingestion: integers are converted to decimal numbers
Bug description
This follows a discussion in the Google Group Dataverse:
https://groups.google.com/g/dataverse-community/c/75OO7gDNpsw/
When ingesting an XLSX file, any non-zero numeric value is always interpreted as a decimal number.
This results in integers being automatically converted into decimals:
2→2.0-2→-2.02.5→2.5
This behavior can be problematic in some use cases (e.g., phone numbers, street numbers).
Notes:
- This issue does not occur when ingesting a CSV file, where integers remain integers.
- Reproducible example dataset: test dataset.
Steps to reproduce
- Create an XLSX file containing a column with integer values.
- Ingest into Dataverse.
- Observe that all integers are converted into decimals.
Expected behavior
- Keep integers as integers during ingestion unless a decimal value is explicitly present.
2→2-2→-22.5→2.5
Impact
- Incorrect data representation in business contexts.
- Inconsistency between XLSX ingestion and CSV ingestion.
Affected versions
- All versions of Dataverse
Help is always welcome, is this bug something you or your organization plan to fix?
- This is not currently planned by the team.
Hi @stevenferey, I’m interested in working on this issue. Could you please assign it to me? Cheers.
Hello @pizofreude , I don't have the rights to assign you the issue but you can definitely work on it ! Thanks
@pizofreude assigned! Please go for it! If you have any questions, you can ask here or in #dev at https://dataverse.zulipchat.com
@pizofreude hi! Just checking in. Do you need any help getting started?
@pdurbin hi, I'm finishing DE course right now which took more time that I had initially planned so I had to compromise my open source contribution atm. Will get back to it once I'm done with the course. Cheers and sorry for the wait!