Does this implementation include the proximal regularization described in the paper? If not, can you provide an example of how to integrate this modified loss into the current code base?
see issue 11, please