Support uploading Parquet files
Brief overview
AS A user of a dataframe library like Pandas/Polars/etc
I WANT to be able to upload my Parquet dataset
SO THAT I can skip all the nonsense around csv inference
Additional details
To support this we need to support an analogous type for each type in the Parquet format. Some notables currently missing
- [ ] Datetime
- [ ] Bytes/bytearrays - necessary for UUID as well
- [ ] Enum (we're allowed to parse as strings if necessary)
- [ ] Time
- [ ] Interval
Perhaps we could get away without supporting everything to start with, but without at least datetime and probably bytes there would be no real benefit to claiming any kind of support
Dupe of #99 ?
Dupe of #99 ?
It absolutely is, yes - my mistake. :)
I've closed the other issue as this has slightly more detail on the types csvbase is missing.
What about converting unsupported datatypes to STRING, until they are all supported?
What about converting unsupported datatypes to STRING, until they are all supported?
That's actually not a bad idea. We could mark it as experimental or something meanwhile.
Would that help you use csvbase for your usecase?
Sure, as my data source converts everything to strings anyway :P
Ok, I think this can be moved up then. I'll try to have a go next week
Would you consider it a good first issue for new contributors?
If I do not need to know too much about the codebase to make the change I would gladly have a shot at it.
Would you consider it a good first issue for new contributors?
If I do not need to know too much about the codebase to make the change I would gladly have a shot at it.
Hmm, probably not as it requires both a fair amount of knowledge and also involves making a load of design decisions.
Probably the best first changes are stuff related to getting it working locally for you. Many people use docker (but I don't so I don't discover the problems there). Does the docker container work for you? Can you think of any ways to improve it? Can you remove tini and thereby possibly resolve https://github.com/calpaterson/csvbase/issues/126?