podium icon indicating copy to clipboard operation
podium copied to clipboard

Numericalizer abc

Open ivansmokovic opened this issue 5 years ago • 2 comments

This is a WIP of the new NumericalizerABC interface used for Vocab and other vectorizers. The existing codebase required relatively minor changes to enable this.

Since this is WIP at the moment, it still requires implementing this interface in the existing vectorizers and handling the TODOs left in the code.

I'm creating this pull request so we can review and improve the general idea before any more serious work is done.

The interface itself is called NumericalizerABC and is contained in the numericalizer_abc.py file in the preproc package.

Be aware that this PR is based on the dataset-abc branch, so the diff shows changes from both branches. Set show changes from black correction when inspecting files changed to avoid irrelevant changes.

Focus on the NumericalizerABC, Field and Vocab classes.

ivansmokovic avatar Nov 05 '20 14:11 ivansmokovic

Ready for merge. The only changes from the last review round are the addition of mark_finalized and documentation.

ivansmokovic avatar Dec 10 '20 19:12 ivansmokovic

As discussed on the regular meeting, just leaving a note here that this is currently on hold.

FilipBolt avatar Dec 18 '20 00:12 FilipBolt