oopt-gnpy icon indicating copy to clipboard operation
oopt-gnpy copied to clipboard

Reduce size by removing scipy

Open ojnas opened this issue 2 years ago • 4 comments

Adding gnpy (using pip install) when e.g. building an application as docker containers increases the size of the application by a few 100's of MBs, which can sometimes be a bit problematic. The main reason is the requirements, which includes some very large packages, the largest one being scipy. Searching through the code, I can only find two uses of scipy: 1) constants (which could easily be replaced by internal definitions) and 2) interp1d from scipy.interpolate (for which there is a practically equivalent function in numpy). Could removing the dependency on scipy be an option?

Another large dependency is pandas, which only seems to be used in a couple of tests.

ojnas avatar Nov 07 '22 16:11 ojnas

Unfortunately, numpy.interp is not equivalent to scipy.interpolate.interp1d and the latter provides more flexible usage. In GNPy scipy.interpolate.interp1d cannot be replaced by numpy.interp seamlessly, because of the usage in gnpy.core.science_utils, where scipy.interpolate.interp1d allows the interpolation along one selected axis (lines 171, 172, 412, 435). This obstacle can be overcome introducing an inline for loop but this solution won't scale properly with the integration resolution (I tested this solution).

AndreaDAmico avatar Nov 29 '22 09:11 AndreaDAmico

Hi @AndreaDAmico,

with "inline for loop", do you mean something like:

power_profile = asarray([interp(z_final, z, p) for p in power_profile])

instead of the current:

power_profile = interp1d(z, power_profile, axis=1)(z_final)

I tested this and it seems to work. In what way does it not scale properly?

ojnas avatar Nov 29 '22 12:11 ojnas

I proposed a change here:

https://review.gerrithub.io/c/Telecominfraproject/oopt-gnpy/+/546806

ojnas avatar Nov 29 '22 21:11 ojnas

Hi @ojnas,

Exactly, that is the solution I was talking about. I ran some simulations changing the sim_params.raman_params.result_spatial_resolution (which determines the size of z_final) and the number of channels. In the figure that I attached to this comment, you can see that the numpy solution is always at least the double slower than the scipy solution. That is why I am saying that this change cannot be done seamlessly.

time_scaling

AndreaDAmico avatar Nov 30 '22 17:11 AndreaDAmico