Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

Convert various classes to functions

Open alexdewar opened this issue 9 months ago • 1 comments

I'm going to go out on a limb and say I think most of the big classes in AC would be better implemented as separate functions (there might be exceptions -- I haven't gone through every line!).

For example, take the Abbreviations class. It does all the processing in __init__, assigns a result to self.abbreviations, then users retrieve that value by calling to_dict. Most of the methods aren't actually using self, so don't need to be methods at all. Instead of the Abbreviations class, you could just have a process_abbreviations function which returns this value instead (all the methods would be converted to standalone functions). The upside of doing things this way is that a) it's clearer what the inputs/outputs are for every function and b) it makes it easier to write tests. The problem with classes in general is that when you call a method it's often not obvious which attributes of the object -- if any -- might be changed when you call it.

I'm not a total functional programming fanatic (I think objects have their place!), but I think that at least some of these classes would be better as separate functions. Apart from anything, we do want to make it easy to add unit tests for this stuff and that's quite hard if you're working with an object with mutable state rather than plain old functions.

I think the main Autocorpus class is a bit of a special case. I think a lot of it could be replaced with separate functions (like Abbreviations it does all the processing in __init__ then the result is read out somewhere else), but I think the various functions for formatting the output data in different ways is actually quite clean. You could still turn most of the class into separate functions, then you would have a process_html_file function or whatever returning a dataclass ("OutputData" or something) and then the dataclass could have the methods for formatting the output data.

Feel free to disagree with this! I just thought I'd make an issue so we can discuss it.

alexdewar avatar Mar 14 '25 15:03 alexdewar

On hold until #142 is done.

alexdewar avatar Mar 24 '25 15:03 alexdewar