django-import-export icon indicating copy to clipboard operation
django-import-export copied to clipboard

Feature proposal - import and export management commands

Open bmihelac opened this issue 1 year ago • 1 comments

I believe it would be useful to have Django management commands to import and export data from the command line. This could also be useful for automating repeated imports and exports.

Below is the proposed syntax with invocation examples.


import_data command

Syntax

manage.py import_data [options] <resource> <import_file_name>
  • <resource> - Resource class or Model class as dotted path, ie: mymodule.resources.MyResource or auth.User
  • <import_file_name> - file to import

Options:

  • --format=FORMAT - import export format, guess from mimetype if empty
  • --dry-run - Dry run
  • --raise-errors - Raise errors
  • --no-raise-errors - Do not raise errors

Examples

Import data from file into auth.User model using default model resource:

python manage.py import_data auth.User users.csv

Import data from file using custom model resource, raising errors:

python manage.py import_data  --raise-errors helper.MyUserResource users.csv

export_data command

Syntax

manage.py export_data [options] <format> <resource>
  • <format> - export format
  • <resource> - Resource class or Model class as dotted path, ie: mymodule.resources.MyResource or auth.User

Example

Export data from auth.User model in CSV format to standard output.

python manage.py export_data CSV auth.User

Please take a moment to read through the details and share any feedback, suggestions, or concerns you might have.

bmihelac avatar Oct 17 '24 15:10 bmihelac

Likes like a useful addition.

I'm wondering if simply import or export would suffice? i.e. _data is redundant?

Is there a Python or Django pattern for accepting file input for importing? Should we use redirection? I note that psql uses this pattern:

python manage.py import_data auth.User < users.csv

Some more discussion here

Do we need a --no-raise-errors flag? should that be the default behaviour and --raise-errors enables?

matthewhegarty avatar Oct 17 '24 17:10 matthewhegarty

I'm wondering if simply import or export would suffice? i.e. _data is redundant?

I agree. The idea was not to use generic names and avoid potential name conflicts with other libraries, but I cannot find any library that uses import and export

Is there a Python or Django pattern for accepting file input for importing? Should we use redirection? I note that psql uses this pattern:

I did not find any reference to best practices for Python/Django programs. To me, it seems clearest to use an explicit argument that accepts '-' for stdin.

python manage.py import_data auth.User users.csv
python manage.py import_data auth.User -

Do we need a --no-raise-errors flag? should that be the default behaviour and --raise-errors enables?

No, we do not.

bmihelac avatar Oct 21 '24 08:10 bmihelac

All sounds great.

To me, it seems clearest to use an explicit argument that accepts '-' for stdin.

Yes that sounds like the best approach. Similar to how grep handles it. I guess it would be ideal to be able to specify multiple files as args.

grep  searches the named input FILEs (or standard input if no files are
   named, or if a single hyphen-minus (-) is given as file name) for lines
   containing  a  match to the given PATTERN. 

matthewhegarty avatar Oct 21 '24 09:10 matthewhegarty

One potential solution to this problem is to offer a flexible framework that allows developers to create their own import/export commands, rather than providing a built-in command. This would be similar to how we handle the admin interface, where we don't provide the interface itself, but instead offer mixins, etc. that integrate with the framework.

A practical way to implement this would be to:

  1. Clearly document the current import/export capabilities so users understand what is possible out of the box outside of the admin context.
  2. Provide and document any additional/modified interfaces to make the export/import of data available in a command. This approach would allow us to gauge the demand for a built-in command like you all say, while maintaining maximum flexibility and customizability for developers.

I hope this clarifies the suggestion.

andrewgy8 avatar Oct 24 '24 14:10 andrewgy8

For a more practical way of how I see this from the users POV would be something like:


class BookResource(resources.ModelResource):
    class Meta:
        model = Book
    
    # users can still hook into all the infra provided by import-export

class Command(ImportExportCommandMixin):
    help = "Import/Export data for MyModel"
	resource_class = BookResource

Side note: if we start making the library more agnostic as a whole, there is no reason we cant let users import/export data from other areas of the framework, views, API, etc.

andrewgy8 avatar Oct 24 '24 15:10 andrewgy8

Just hacking around and this works already:

import resource
from django.core.management.base import BaseCommand
from import_export import resources
from core.models import Book  # Replace with your actual model


class ImportExportCommandMixin:
    help = "Base class for import/export commands"

    def add_arguments(self, parser):
        parser.add_argument(
            '--action', type=str, choices=['import', 'export'], help="Choose action: import or export", required=True
        )
        parser.add_argument(
            '--file', type=str, help="Path to the file to import from or export to", required=True
        )

    def handle(self, *args, **options):
        action = options['action']
        file_path = options['file']
        resource_class = self.get_resource_class()

        if action == 'export':
            self.export_data(resource_class, file_path)
        elif action == 'import':
            self.import_data(resource_class, file_path)

    def get_resource_class(self):
        if not hasattr(self, 'resource_class'):
            raise NotImplementedError("You must provide a 'resource_class' attribute")
        return self.resource_class

    def import_data(self, resource_class, file_path):
        resource = resource_class()
        with open(file_path, 'r') as file:
            dataset = resource.import_data(file, format='csv')
            result = resource.import_data(dataset, dry_run=False) 
            self.stdout.write(self.style.SUCCESS(f'Successfully imported {result.totals["new_records"]} records'))

    def export_data(self, resource_class, file_path):
        resource = resource_class()
        dataset = resource.export()
        with open(file_path, 'w') as file:
            file.write(dataset.csv)
        self.stdout.write(self.style.SUCCESS(f'Successfully exported data to {file_path}'))

class BookResource(resources.ModelResource):

    class Meta:
        model = Book

    def for_delete(self, row, instance):
        return self.fields["name"].clean(row) == ""
    
class Command(ImportExportCommandMixin, BaseCommand):
    help = "Import/Export data for MyModel"
    resource_class = BookResource

Still need to configure file type and file paths properly. But its pretty close without any core codebase changes.

andrewgy8 avatar Oct 24 '24 20:10 andrewgy8

Another point needs to be clarified here, why use this feature of import/export management command rather than the loaddata and dumpdata command? ie. what is the value we are providing by adding this command.

FWIW, we use this when loading data for our test app.

andrewgy8 avatar Oct 26 '24 09:10 andrewgy8

The resource argument is intended to be either a resource class (such as the BookResource class in the example above) or a model class (like the Book model in your example). From the proposal description:

<resource> - Resource class or Model class as a dotted path, e.g., mymodule.resources.MyResource or auth.User

When the resource argument is a resource class, it can be customized in the same way as import and export resources are currently customized.

Regarding the rationale for creating a management command, the main benefit would be to provide a standard method for using django-import-export from the command line, bringing advantages such as:

  • Exporting or importing data from the command line when the data volume makes using the admin interface inconvenient.
  • Automating processes, such as a cron job to export open invoices daily.

loaddata and dumpdata do not offer much room for customization of imports and exports as django-import-export has.

@andrewgy8, does that make sense to you?

bmihelac avatar Oct 28 '24 19:10 bmihelac

Yep, totally makes sense. I was mostly asking only so we could be clear about the value add here v. What is already included in Django 😀

As far as the API for the command goes, my concern comes from handling the importing of the Resource as well as any sort of error handling that would come from the command. I suppose this can be handled with a number of tests.

The other issue I foresee is, how would a user do something with the file that is exported given this proposed API? Ie how does a user upload it to S3 or some other destination? Would they have to add this to the resource?

andrewgy8 avatar Oct 31 '24 07:10 andrewgy8

The other issue I foresee is, how would a user do something with the file that is exported given this proposed API? Ie how does a user upload it to S3 or some other destination? Would they have to add this to the resource?

This is not covered by the proposal, and I don't think it would be a good idea to offer hooks for this. Users can easily pipe the export to an S3 upload command.

bmihelac avatar Nov 05 '24 13:11 bmihelac

All sounds great.

To me, it seems clearest to use an explicit argument that accepts '-' for stdin.

Yes that sounds like the best approach. Similar to how grep handles it. I guess it would be ideal to be able to specify multiple files as args.

grep  searches the named input FILEs (or standard input if no files are
   named, or if a single hyphen-minus (-) is given as file name) for lines
   containing  a  match to the given PATTERN. 

Regarding importing multiple files, this is not covered. I think the user can easily concatenate text files (cat) or run the command multiple times.

bmihelac avatar Nov 05 '24 13:11 bmihelac

Users can easily pipe the export to an S3 upload command.

I beg to differ @bmihelac. In my current org (and even the previous org), I would be unable to get a pipe to S3 (or snowflake in a previous org) to work in the cronjob set up we have in kubernetes. I really do think we should support some sort of hook for this to be used extensively.

On second thought: it's fine. We will see what happens after this is released.

andrewgy8 avatar Nov 05 '24 16:11 andrewgy8

Users can easily pipe the export to an S3 upload command.

I beg to differ @bmihelac. In my current org (and even the previous org), I would be unable to get a pipe to S3 (or snowflake in a previous org) to work in the cronjob set up we have in kubernetes. I really do think we should support some sort of hook for this to be used extensively.

I've thought about this, especially adding signals, but I still can't find a legitimate reason to add hooks in the export command. The export management command is short enough that it can easily be copy-pasted if special handling is needed:

https://github.com/bmihelac/django-import-export/blob/feat/management-commands/import_export/management/commands/export.py

@andrewgy8 If you can provide more details about the problems, maybe we can find a solution.

On second thought: it's fine. We will see what happens after this is released.

:+1:

bmihelac avatar Nov 06 '24 06:11 bmihelac