django-data-wizard Debugging

Hi there

This might be a question for DRF instead, but how exactly does one use pdb with this library. If one inserts a set_trace() the output on the django server keeps rolling past and so even though one is able to interact with pdb, the command prompt disappears under a torrent of HTTP requests. Is there any way to pause everything so I can debug ?

Thanks

Sep 10 '20 11:09 dejmail

I wouldn't mind some debugging insights, not for the reasons you asked... but since even using "BACKEND": "data_wizard.backends.immediate" I couldn't seem to get my IDE (Pycharm) to catch any errors from data_wizard. Which combined with the test setup complexity, makes it harder than it needs to be to work with this library. I'm trying to fix #31 because I'm using the very common django-storages library, and making zero progress because I'm getting no useful output from tests or debugging.

Nov 10 '21 04:11 techdragon

I will add some documentation on debugging tips, but here are a few things to start:

General Tips

Given the wide variety of use cases and failure points, Data Wizard traps most errors by default, to ensure the user can get a short, hopefully informative message rather than a generic 500 error. The trapped errors are logged via python's logging module.
The threading backend (enabled by default) adds another layer of indirection when trying to identify an exception.
Thus, if you are writing a custom Iter or Serializer class, make sure each component works in isolation before trying to debug within the Data Wizard stack. (See examples below)
Once you have confirmed that itertable and the serializer are working individually, try running data_wizard without any web UI traffic via the CLI (./manage.py runwizard).
Once that is working, try running through the web UI with ./manage.py runserver and the immediate backend:

DATA_WIZARD = {
    "BACKEND": "data_wizard.backends.immediate"
}

Debugging File Loading/Parsing (IterTable)

To debug issues loading and parsing files, try using itertable directly:

from itertable import load_file

for row in load_file('/path/to/file.xlsx'):
    print(row)

Note that existing releases of itertable automatically suppress the OSError raised when a file is inaccessible, so it doesn't even make it back to Data Wizard. For the next release, I changed this to raise itertable.exceptions.LoadFailed unless require_existing is explicitly set to false.

If you are writing a custom Iter class, test the class with a similar loop:

from myapp import CustomIter

for row in CustomIter(filename='/path/to/file.xlsx'):
    print(row)

Debugging the Serializer (DRF)

To investigate validation issues, try instantiating the DRF serializer class directly.

from data_wizard import registry
Serializer = registry.get_serializer("My Model")
serializer = Serializer(data={"test": "data"})
serializer.is_valid(raise_exception=True)

Note that data_wizard traps any and all serializer errors for individual rows, saving only the error text to the Record table. The full stack trace is still sent to the Python logging module.

Nov 18 '21 07:11 sheppard