Debugging
Hi there
This might be a question for DRF instead, but how exactly does one use pdb with this library. If one inserts a set_trace() the output on the django server keeps rolling past and so even though one is able to interact with pdb, the command prompt disappears under a torrent of HTTP requests. Is there any way to pause everything so I can debug ?
Thanks
I wouldn't mind some debugging insights, not for the reasons you asked... but since even using "BACKEND": "data_wizard.backends.immediate" I couldn't seem to get my IDE (Pycharm) to catch any errors from data_wizard. Which combined with the test setup complexity, makes it harder than it needs to be to work with this library. I'm trying to fix #31 because I'm using the very common django-storages library, and making zero progress because I'm getting no useful output from tests or debugging.
I will add some documentation on debugging tips, but here are a few things to start:
General Tips
-
Given the wide variety of use cases and failure points, Data Wizard traps most errors by default, to ensure the user can get a short, hopefully informative message rather than a generic 500 error. The trapped errors are logged via python's
loggingmodule. -
The threading backend (enabled by default) adds another layer of indirection when trying to identify an exception.
-
Thus, if you are writing a custom Iter or Serializer class, make sure each component works in isolation before trying to debug within the Data Wizard stack. (See examples below)
-
Once you have confirmed that itertable and the serializer are working individually, try running
data_wizardwithout any web UI traffic via the CLI (./manage.py runwizard). -
Once that is working, try running through the web UI with
./manage.py runserverand theimmediatebackend:
DATA_WIZARD = {
"BACKEND": "data_wizard.backends.immediate"
}
Debugging File Loading/Parsing (IterTable)
To debug issues loading and parsing files, try using itertable directly:
from itertable import load_file
for row in load_file('/path/to/file.xlsx'):
print(row)
Note that existing releases of
itertableautomatically suppress theOSErrorraised when a file is inaccessible, so it doesn't even make it back to Data Wizard. For the next release, I changed this to raiseitertable.exceptions.LoadFailedunlessrequire_existingis explicitly set to false.
If you are writing a custom Iter class, test the class with a similar loop:
from myapp import CustomIter
for row in CustomIter(filename='/path/to/file.xlsx'):
print(row)
Debugging the Serializer (DRF)
To investigate validation issues, try instantiating the DRF serializer class directly.
from data_wizard import registry
Serializer = registry.get_serializer("My Model")
serializer = Serializer(data={"test": "data"})
serializer.is_valid(raise_exception=True)
Note that
data_wizardtraps any and all serializer errors for individual rows, saving only the error text to theRecordtable. The full stack trace is still sent to the Pythonloggingmodule.