Numericalizer abc
This is a WIP of the new NumericalizerABC interface used for Vocab and other vectorizers.
The existing codebase required relatively minor changes to enable this.
Since this is WIP at the moment, it still requires implementing this interface in the existing vectorizers and handling the TODOs left in the code.
I'm creating this pull request so we can review and improve the general idea before any more serious work is done.
The interface itself is called NumericalizerABC and is contained in the numericalizer_abc.py file in the preproc package.
Be aware that this PR is based on the dataset-abc branch, so the diff shows changes from both branches. Set show changes from black correction when inspecting files changed to avoid irrelevant changes.
Focus on the NumericalizerABC, Field and Vocab classes.
Ready for merge. The only changes from the last review round are the addition of mark_finalized and documentation.
As discussed on the regular meeting, just leaving a note here that this is currently on hold.