csvapi
csvapi copied to clipboard
Kafka integration
- [x] Kafka Integration (only consumer)
- [x] Read message from udata-analysis-service
- [x] Parse file (could be from minio instead of downloading again resource)
- [x] Add csv-detective type detection to help agate to store resource into sqlite
- [x] Add pandas profiling analysis (minimal) and generation of json report
- [x] Store new infos into sqlite in new tables :
- general_infos : basic info on resource
- column_infos : basic info on each column of resource
- categorical_infos : categorical values for each columns (limit to 10)
- top_infos : top values for each columns (limit to 10)
- numeric_infos : basic info on each numeric column of resource (mean, std, min, max)
- numeric_plot_infos : repartition of values of numeric column in a plot
- [x] Update API to list those new info if we have them
This branch is now published on pypi https://app.circleci.com/pipelines/github/etalab/csvapi/91/workflows/09dba6e2-b91f-4cf2-af03-71a9daee9bbb/jobs/605
⚠️ remove this publication when merged on master