deep-rules Abstract as much as possible

Abstract as much as possible

Open Benjamin-Lee opened this issue 5 years ago • 7 comments

Have you checked the list of proposed tips to see if the tip has already been proposed?

[x] Yes

Did you add yourself as a contributor by making a pull request if this is your first contribution?

[x] Yes, I added myself or am already a contributor

Feel free to elaborate, rant, and/or ramble.

I just discovered Ludwig, a DL toolkit which uses YAML configuration files, rather than code, to define, train, evaluate, and visualize models. While the usefulness of this exact tool has yet to be seen, it did get me thinking about abstraction (in the computer science sense).

To put the idea into tip form: abstract as much as possible/don't implement anything you don't have to/start with the highest level tool you can/write as little code as possible/keep your error surface as small as possible.

With newer tools such as Ludwig and Keras (and many more I can mention), the implementation details are further abstracted away, allowing one to focus on the task at hand. I think we should recommend using the highest level tools that can accomplish the task, especially for those new to deep learning. This would have certainly saved me a lot of time and I'm interested in hearing what others think about this.

Feb 13 '19 19:02 Benjamin-Lee

I like that idea, but would caveat that you should not have reckless abandon and should still be principled in what you are using

Feb 13 '19 19:02 AlexanderTitus

Is it worth mentioning/discussing that the reader should do some sort of cost-benefit analysis of the setup required for learning some new system/toolkit vs. just hacking it together to get the result they need?

Feb 13 '19 19:02 pstew

I may be in the minority, but I must say that I am not a big fan of wrappers and toolkits on top of the main libraries because they tend to get abandoned at some point or have buglets here and there that can be annoying in practice (plus they also come with their own learning curve). E.g., I usually only rely on PyTorch/Tensor main libraries and use them as they come without a wrapper around them. This way, there are fewer dependencies when sharing code, and also I don't have to worry about whether the wrappers will break upon new releases of the said libraries. Anyways, just a personal preference.

Regarding Luwdig and the YAML wrappers, that's usually a good idea! However, it's basically like Caffe then. It's fine for standard stuff but can also be a bit limiting.

Feb 13 '19 22:02 rasbt

This may be good to mention. I don't think abstraction necessarily means ignoring the theory and caveats of a set of algorithms. Given the effort that goes into designing and debugging new implementations, it would be ideal if they could be re-used. Personally, I find that using abstractions and re-using/modifying existing implementations reduces the number of hours I spend coding/debugging, and allows me to spend more time working on the biological or theoretical components of the project.

Feb 15 '19 14:02 evancofer

I don't think abstraction necessarily means ignoring the theory and caveats of a set of algorithms

True, I agree. But I think in this context of modern DL frameworks like TensorFlow and PyTorch,

E.g.,

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(...)),
    tf.keras.layers.Dense(num_hidden_1, activation=tf.nn.relu),
    tf.keras.layers.Dense(num_hidden_2, activation=tf.nn.relu),
    tf.keras.layers.Dense(num_classes, activation=tf.nn.softmax)
])

class MultilayerPerceptron(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super(MultilayerPerceptron, self).__init__()
        
        self.my_network = torch.nn.Sequential(
            torch.nn.Linear(num_features, num_hidden_1),
            torch.nn.ReLU(),
            torch.nn.Linear(num_hidden_1, num_hidden_2),
            torch.nn.ReLU(),
            torch.nn.Linear(num_hidden_2, num_classes)
        )
           
    def forward(self, x):
        logits = self.my_network(x)
        probas = F.softmax(logits, dim=1)
        return logits, probas

there is already so much abstraction that it's not worth imho to use some external libraries by default that define wrappers around it and introduce the risk that there are bugs and your network ends up doing something different what you think is doing or will stop working at some point. Learning-curve wise it's also much harder to find examples for let's say the same as above in Ludwig and it would make it much harder to find learning resources.

So, above, I was not suggesting to use any level of abstraction and code everything up from scratch (this is sth I do only for teaching, for example, to understand underlying concepts the first time) but to be choosy and selective when it comes to third party libraries added on top of it.

In general, I agree with @Benjamin-Lee though regarding

To put the idea into tip form: abstract as much as possible/don't implement anything you don't have to/start with the highest level tool you can/write as little code as possible/keep your error surface as small as possible.

I would just suggest to extend it and recommend users to start by taking existing code / examples (could be from the official tutorials, documentation, or other projects), get it running on a given project and tweak it from there. (At least that's usually what I find most productive for my self :) )

Feb 15 '19 15:02 rasbt

recommend users to start by taking existing code / examples (could be from the official tutorials, documentation, or other projects), get it running on a given project and tweak it from there

I think that this is the general idea that I'm getting at. It's much easier to take code that works and adapt it than to write new code from scratch.

there is already so much abstraction that it's not worth imho to use some external libraries by default that define wrappers around it and introduce the risk that there are bugs and your network ends up doing something different what you think is doing or will stop working at some point

I would argue that for tools such as Keras and Gluon (whose docs are written as standalone notebooks), which are incredibly widely used and aimed at beginners, the risk of bugs generated by the toolkit is drastically smaller than the risks generated by novices using overcomplicated tools which are difficult to grok. (I say this as a novice who has done this more times than I'd like to admit).

Feb 17 '19 08:02 Benjamin-Lee

I would argue that for tools such as Keras

Sure, I agree. I was already thinking of Keras/tf.keras as established DL libraries, not necessarily wrappers (sure they wrap other lower level code, but I was more thinking of tools like Ludwig when I was referring to wrappers). Tf.keras is also officially supported by TensorFlow for example, so that's one I would recommend. Like I said, the problem with third-party libraries is usually that they have a short life span, so I generally wouldn't recommend them because it may be more hassle in the long run.

Feb 17 '19 20:02 rasbt

deep-rules deep-rules copied to clipboard

Abstract as much as possible

deep-rules
deep-rules copied to clipboard