[WIP] Data importer
This PR represents a major change in approach to how we import / export data.
Previously we have relied on external libraries for importing / exporting - which all come with various downsides.
The main idea of this PR is to utilize the existing DRF API framework for both importing and exporting of data. This should provide the following major advantages:
- Significantly reduce code duplication
- Hook in to existing API framework which is well tested
- Improve speed of import / export
- Allow finer control over import / export processes
- Allow specification of custom import hooks
- Expose import / export processes to custom plugins
Framework
Uses existing DRF serializers, just need to register with the @register_importer() decorator.
Related Issues
- Closes https://github.com/inventree/InvenTree/pull/6792
- Closes https://github.com/inventree/InvenTree/issues/3101
- Closes https://github.com/inventree/InvenTree/issues/6376
- Closes https://github.com/inventree/InvenTree/issues/6389
- Closes https://github.com/inventree/InvenTree/issues/6955
- Closes https://github.com/inventree/InvenTree/issues/6956
- Closes https://github.com/inventree/InvenTree/issues/6321
- Closes https://github.com/inventree/InvenTree/issues/6664
- Ref: https://github.com/inventree/InvenTree/issues/5571
TODO
- [x] Remove the
APIDownloadMixinclass (and all references) - [x] Remove the
download_querysetmethod in a many existing views - [ ] Use database model
verbose_nameif serializer does not provide alabelattribute - [ ] Add / update unit testing for new dataset export approach
- [ ] Add unit testing for new dataset import apprpoach
- [x] Add worker task to cleanup old import / export sessions
- [ ] Add unit test for cleanup backup task
- [ ] Ensure user has correct permissions before creating a new import session
- [ ] Code coverage for additional code
Further Work
- [ ] Expose import / export functions to plugins
- [ ] Allow the entire import workflow to be controlled manually via the backend admin interface
- [ ] Expose the new exporter workflow to the django admin interface
- [ ] Tag all "model resource" classes as deprecated and will be removed in further release
- [ ] Profiling for new data export functionality, to determine where we can optimize
- [ ] Run data export in background worker
Deploy Preview for inventree-web-pui-preview ready!
| Name | Link |
|---|---|
| Latest commit | c462eb713bc49883d2bf7ae338b9a01ae6644f38 |
| Latest deploy log | https://app.netlify.com/sites/inventree-web-pui-preview/deploys/6688f676568d070008068aa1 |
| Deploy Preview | https://deploy-preview-6911--inventree-web-pui-preview.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
Lighthouse |
1 paths audited Performance: 100 (no change from production) Accessibility: 86 (no change from production) Best Practices: 92 (no change from production) SEO: 70 (no change from production) PWA: - View the detailed breakdown and full score reports |
To edit notification comments on pull requests, go to your Netlify site configuration.
Quality Gate failed
Failed conditions
C Reliability Rating on New Code (required ≥ A)
See analysis details on SonarCloud
Catch issues before they fail your Quality Gate with our IDE extension
SonarLint
Very exciting! Is the removal of json/yaml a permanent change or something that was not in scope for the initial implementation?
@SchrodingersGat is this still slated for 1.0 or is 0.16 or sooner realistic?
I'm hoping for 0.16.0 release. Now that the refactoring is done I have some time to commit to this. Soon it should be ready for a first round review.
Very cool. There is a lot of untested frontend code that can be removed once the new importer works.
Yes once we can remove the old importer code, that will be fantastic.
Codecov Report
Attention: Patch coverage is 77.05510% with 254 lines in your changes missing coverage. Please review.
Project coverage is 83.90%. Comparing base (
58f12f5) to head (c462eb7). Report is 342 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #6911 +/- ##
==========================================
- Coverage 84.20% 83.90% -0.31%
==========================================
Files 1079 1092 +13
Lines 47071 47945 +874
Branches 1444 1461 +17
==========================================
+ Hits 39638 40226 +588
- Misses 7054 7340 +286
Partials 379 379
| Flag | Coverage Δ | |
|---|---|---|
| backend | 85.25% <79.61%> (-0.17%) |
:arrow_down: |
| pui | 65.46% <40.27%> (-0.63%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Can we merge this?
@smithcv this is not finalized and tested. Please allow the time needed to get it done properly with the resources the project has.
@smithcv thanks for your interest but still a lot of work here. If you or your company would like to contribute to the development of this feature, you can provide funding against this issue :)
@SchrodingersGat I have created a branch that removes all import-export dependant things: https://github.com/matmair/InvenTree/tree/remove-imports
There are a lot of places where we use resource definitions for export - making this more generic (maybe as a view mixin?) would be great to remove code duplication
Data Export Profiling
The new data export functionality uses the optimized API serializers, and (at the current implementation) is already a huge improvement over the existing export tool which uses django-import-export
| API Endpoint | Record Count | Old Exporter | New Exporter |
|---|---|---|---|
/api/part/ |
5,000 | did not return after multiple minutes | 17.3s , 11 DB hits |
/api/stock/ |
1,100 | 3.4s in 890 DB hits | 1.0s in 16 DB hits |
/api/stock/location/ |
20 | 0.4s in 38 DB hits | 0.5s in 4 DB hits |
/api/order/po/ |
10 | 0.4s in 48 DB hits | 0.5s in 23 DB hits |
However there is still some optimizing to be done here. Still need to profile where the time is being spent on the data export. Even though django-import-export uses a lot more DB hits their export is still faster in some instances.
@SchrodingersGat that looks very promising
@SchrodingersGat you need any assistance with this?
I need to remember where I was up to and what the next steps might be. There is still a lot of work to go..
So far I have got a framework in place for creating an import session, and extracting data from table. This data is split into rows which get "serialized" against the matching serializer for validation and ingest.
Major Tasks
- [ ] The user interface elements (in PUI) still need a lot of work
- [ ] Do we want to support this framework within the django admin interface?
- [ ] Still need to work out the final steps for validating data rows and then saving to the database
- [ ] No unit testing as yet
If you wanted to take a look at where I'm up to so far and provide input on the approach in general. I'm hoping to get back to this soon, now that the reporting has been refactored. It is a major blocker for a lot of outstanding issues.
This is now progressed far enough to merge into master. Exporting data from tables (in CUI and PUI) now uses the new data export tool and is much faster. Data are serialized into the file using the existing DRF layer, which is very query efficient and reduces code duplication.
There is now also a "Data Import" section in the PUI admin page - which is functional, but still requires some work:
There are a lot of outstanding tasks / improvements to do (see outstanding tasks above) which will be pushed out into new issues.
The goal is to remove the reliance on django-import-export entirely.
@SchrodingersGat thank you for pushing this ahead! Are any of these goals blocking for 0.16.0 release? We should probably log and tag those separately to not lose track over the next month.
I will make some separate issues soon, while it is still fresh on my mind.
