pzflow icon indicating copy to clipboard operation
pzflow copied to clipboard

Thoughts from implementing in Matlab

Open jeremylea opened this issue 8 months ago • 1 comments

This is not really an issue. I needed something like this, but in Matlab (without calling out to Python), so I chose to crib this package. I've just published my code at [https://github.com/jeremylea/DLextras]. These are some thoughts from implementing things again in a different language and with a different target problem. I don't want to file each as a separate issue and pollute things, so this will be a mixed bag. But before that, thank you for developing this package. It's been a real help in solving a long-standing problem in my research, and I'll find several other uses for the idea soon. I didn't need the data error model for my work, so I didn't do that, but I might add it for future projects.

The first thing I found was that using (-B,B) as the domain didn't do anything, and using (0,1) everywhere simplified the code. This might

Using 1 for the fixed slopes at the ends of the splines did not work for me when using a uniform input. I needed this to be zero to obtain any reasonable result. If you're using a beta distribution, this would be less of an issue, but that was causing problems for me. I added controls for setting the end slopes to one, zero, or a learnable value, along with periodic. I tried to add a zero-inflated feature but have yet to get that to work.

I found it better to make smaller internal networks and stack more "rolls" on the bijector with fewer knots. That might be my data. However, I did find that scaling the number of nodes in the hidden layers in the internal networks down to input dim (so start with input_dim in the first layer, then linearly scale the number of nodes in each layer up to hidden_dimension in the last layer), seemed to make the networks more trainable. I also pre-seeded the bias in the output layer to generate a "diagonal" spline.

I added a latent layer that is a conditional beta using the same idea of an internal network to get the A and B parameters. This seemed to work well in testing, but for my problem, I found it better in the end to use a uniform latent distribution and a bijector layer with transformed_dim=data_dim, followed by layers with transformed_dim=1. This also required passing the conditions into some latent distribution functions.

I was having some issues with the training and tried a bunch of things... The first was to remove the hard limits on the spacing of the spline knots (I think that's gone in your current code) and add a penalty function for closely spaced knots, along with some seatbelts to prevent division by zero. I also added a penalty for high differences in derivatives at the knots and for having residual correlation in the forward results. These helped, but they caused the solution to find a local minimum. I made these penalties configurable in training, and they seemed to help stabilize the problem in early training, but then I removed them for most of the training.

I also found that learning one less bin width and height and replacing that with zero before the softmax helped to keep the hidden networks stable, and I scaled D by sk so that the learned slopes were proportional to the bin slopes.

I didn't implement the idea of patience (I think that was added after I started converting), but I'll probably do that. I did make the training loop save the best fit and restore that on exit.

I needed a weighting column, so I added that. I added other debugging features, like capturing the random input for the sampling, so you can run the same sample repeatedly throughout training to see only the learning changes.

There are probably other things, but I see them right now. Thanks again for a great package.

jeremylea avatar Jun 07 '24 19:06 jeremylea