InvenTree icon indicating copy to clipboard operation
InvenTree copied to clipboard

[WIP] Data importer

Open SchrodingersGat opened this issue 1 year ago • 14 comments

This PR represents a major change in approach to how we import / export data.

Previously we have relied on external libraries for importing / exporting - which all come with various downsides.

The main idea of this PR is to utilize the existing DRF API framework for both importing and exporting of data. This should provide the following major advantages:

  • Significantly reduce code duplication
  • Hook in to existing API framework which is well tested
  • Improve speed of import / export
  • Allow finer control over import / export processes
  • Allow specification of custom import hooks
  • Expose import / export processes to custom plugins

Framework

Uses existing DRF serializers, just need to register with the @register_importer() decorator.

Related Issues

  • Closes https://github.com/inventree/InvenTree/pull/6792
  • Closes https://github.com/inventree/InvenTree/issues/3101
  • Closes https://github.com/inventree/InvenTree/issues/6376
  • Closes https://github.com/inventree/InvenTree/issues/6389
  • Closes https://github.com/inventree/InvenTree/issues/6955
  • Closes https://github.com/inventree/InvenTree/issues/6956
  • Closes https://github.com/inventree/InvenTree/issues/6321
  • Closes https://github.com/inventree/InvenTree/issues/6664
  • Ref: https://github.com/inventree/InvenTree/issues/5571

TODO

  • [x] Remove the APIDownloadMixin class (and all references)
  • [x] Remove the download_queryset method in a many existing views
  • [ ] Use database model verbose_name if serializer does not provide a label attribute
  • [ ] Add / update unit testing for new dataset export approach
  • [ ] Add unit testing for new dataset import apprpoach
  • [x] Add worker task to cleanup old import / export sessions
  • [ ] Add unit test for cleanup backup task
  • [ ] Ensure user has correct permissions before creating a new import session
  • [ ] Code coverage for additional code

Further Work

  • [ ] Expose import / export functions to plugins
  • [ ] Allow the entire import workflow to be controlled manually via the backend admin interface
  • [ ] Expose the new exporter workflow to the django admin interface
  • [ ] Tag all "model resource" classes as deprecated and will be removed in further release
  • [ ] Profiling for new data export functionality, to determine where we can optimize
  • [ ] Run data export in background worker

SchrodingersGat avatar Apr 02 '24 03:04 SchrodingersGat

Deploy Preview for inventree-web-pui-preview ready!

Name Link
Latest commit c462eb713bc49883d2bf7ae338b9a01ae6644f38
Latest deploy log https://app.netlify.com/sites/inventree-web-pui-preview/deploys/6688f676568d070008068aa1
Deploy Preview https://deploy-preview-6911--inventree-web-pui-preview.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

Lighthouse
1 paths audited
Performance: 100 (no change from production)
Accessibility: 86 (no change from production)
Best Practices: 92 (no change from production)
SEO: 70 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Apr 02 '24 03:04 netlify[bot]

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

sonarqubecloud[bot] avatar Apr 02 '24 04:04 sonarqubecloud[bot]

Very exciting! Is the removal of json/yaml a permanent change or something that was not in scope for the initial implementation?

matmair avatar Apr 02 '24 10:04 matmair

@SchrodingersGat is this still slated for 1.0 or is 0.16 or sooner realistic?

matmair avatar Apr 05 '24 21:04 matmair

I'm hoping for 0.16.0 release. Now that the refactoring is done I have some time to commit to this. Soon it should be ready for a first round review.

SchrodingersGat avatar Apr 05 '24 22:04 SchrodingersGat

Very cool. There is a lot of untested frontend code that can be removed once the new importer works.

matmair avatar Apr 05 '24 22:04 matmair

Yes once we can remove the old importer code, that will be fantastic.

SchrodingersGat avatar Apr 06 '24 03:04 SchrodingersGat

Codecov Report

Attention: Patch coverage is 77.05510% with 254 lines in your changes missing coverage. Please review.

Project coverage is 83.90%. Comparing base (58f12f5) to head (c462eb7). Report is 342 commits behind head on master.

Files with missing lines Patch % Lines
src/backend/InvenTree/importer/models.py 68.12% 80 Missing :warning:
src/backend/InvenTree/importer/serializers.py 65.57% 21 Missing :warning:
src/backend/InvenTree/importer/validators.py 33.33% 18 Missing :warning:
src/backend/InvenTree/importer/admin.py 58.97% 16 Missing :warning:
src/backend/InvenTree/importer/api.py 76.11% 16 Missing :warning:
src/backend/InvenTree/importer/operations.py 67.34% 16 Missing :warning:
.../frontend/src/components/render/StatusRenderer.tsx 25.00% 14 Missing and 1 partial :warning:
src/frontend/src/components/render/Instance.tsx 8.33% 11 Missing :warning:
src/frontend/src/tables/InvenTreeTable.tsx 20.00% 6 Missing and 2 partials :warning:
src/backend/InvenTree/importer/mixins.py 93.85% 7 Missing :warning:
... and 19 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6911      +/-   ##
==========================================
- Coverage   84.20%   83.90%   -0.31%     
==========================================
  Files        1079     1092      +13     
  Lines       47071    47945     +874     
  Branches     1444     1461      +17     
==========================================
+ Hits        39638    40226     +588     
- Misses       7054     7340     +286     
  Partials      379      379              
Flag Coverage Δ
backend 85.25% <79.61%> (-0.17%) :arrow_down:
pui 65.46% <40.27%> (-0.63%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 08 '24 02:04 codecov[bot]

Can we merge this?

smithcv avatar Apr 12 '24 15:04 smithcv

@smithcv this is not finalized and tested. Please allow the time needed to get it done properly with the resources the project has.

matmair avatar Apr 12 '24 18:04 matmair

@smithcv thanks for your interest but still a lot of work here. If you or your company would like to contribute to the development of this feature, you can provide funding against this issue :)

SchrodingersGat avatar Apr 13 '24 12:04 SchrodingersGat

@SchrodingersGat I have created a branch that removes all import-export dependant things: https://github.com/matmair/InvenTree/tree/remove-imports

There are a lot of places where we use resource definitions for export - making this more generic (maybe as a view mixin?) would be great to remove code duplication

matmair avatar Apr 13 '24 12:04 matmair

Data Export Profiling

The new data export functionality uses the optimized API serializers, and (at the current implementation) is already a huge improvement over the existing export tool which uses django-import-export

API Endpoint Record Count Old Exporter New Exporter
/api/part/ 5,000 did not return after multiple minutes 17.3s , 11 DB hits
/api/stock/ 1,100 3.4s in 890 DB hits 1.0s in 16 DB hits
/api/stock/location/ 20 0.4s in 38 DB hits 0.5s in 4 DB hits
/api/order/po/ 10 0.4s in 48 DB hits 0.5s in 23 DB hits

However there is still some optimizing to be done here. Still need to profile where the time is being spent on the data export. Even though django-import-export uses a lot more DB hits their export is still faster in some instances.

SchrodingersGat avatar Apr 14 '24 03:04 SchrodingersGat

@SchrodingersGat that looks very promising

matmair avatar Apr 14 '24 07:04 matmair

@SchrodingersGat you need any assistance with this?

matmair avatar May 27 '24 19:05 matmair

I need to remember where I was up to and what the next steps might be. There is still a lot of work to go..

So far I have got a framework in place for creating an import session, and extracting data from table. This data is split into rows which get "serialized" against the matching serializer for validation and ingest.

Major Tasks

  • [ ] The user interface elements (in PUI) still need a lot of work
  • [ ] Do we want to support this framework within the django admin interface?
  • [ ] Still need to work out the final steps for validating data rows and then saving to the database
  • [ ] No unit testing as yet

If you wanted to take a look at where I'm up to so far and provide input on the approach in general. I'm hoping to get back to this soon, now that the reporting has been refactored. It is a major blocker for a lot of outstanding issues.

SchrodingersGat avatar May 28 '24 06:05 SchrodingersGat

This is now progressed far enough to merge into master. Exporting data from tables (in CUI and PUI) now uses the new data export tool and is much faster. Data are serialized into the file using the existing DRF layer, which is very query efficient and reduces code duplication.

There is now also a "Data Import" section in the PUI admin page - which is functional, but still requires some work:

image

There are a lot of outstanding tasks / improvements to do (see outstanding tasks above) which will be pushed out into new issues.

The goal is to remove the reliance on django-import-export entirely.

SchrodingersGat avatar Jul 06 '24 08:07 SchrodingersGat

@SchrodingersGat thank you for pushing this ahead! Are any of these goals blocking for 0.16.0 release? We should probably log and tag those separately to not lose track over the next month.

matmair avatar Jul 06 '24 11:07 matmair

I will make some separate issues soon, while it is still fresh on my mind.

SchrodingersGat avatar Jul 06 '24 12:07 SchrodingersGat