FreeCAD-Bundle icon indicating copy to clipboard operation
FreeCAD-Bundle copied to clipboard

AppImages is shipping with __pycache__ bloat that is increasing the size of the package significantly

Open onekk opened this issue 3 years ago • 18 comments

Hello

Today I have expanded an AppImage, and I have noted the presence of many __pycache__ directories on the source tree.

This files are usually leftovers, and are bytecompiled code that is recompiled is not found on the next run.

As usually there are a bunch of pyc files on them, maybe deleting these directories, or their content prior to pack the Appimage will make them smaller in term of MB used.

This would resulting in less time to download and maybe in less bandwidth used by FC servers to supply users with AppImages.

Regards

Carlo D.

onekk avatar Feb 17 '22 17:02 onekk

Oh! Maybe that's where all the bloat comes from? Can you check the amount of space it takes up?

package size
FreeCAD_weekly-builds-27518-Linux-Conda_glibc2.12-x86_64.AppImage 940MB
FreeCAD snap edge 573MB

luzpaz avatar Feb 17 '22 17:02 luzpaz

Ok found a viable way, this is something relevant, as the first line is not telling the truth as find is not digging deep in the tree. so using:

du -h ./squashfs-root/ >> usage.txt

and then grep "__pycache__" ./usage.txt >> cached.txt

result is more interesting.

cached.txt

Now you have to count occupation by hand, but some values of 10MB are enough to worth trying to eliminate them from an AppImage

Hope it helps

EDIT I have deleted not relevant post to not pollute this thread.

Regards

Carlo D.

onekk avatar Feb 17 '22 17:02 onekk

Oh wow. Pardon me, @TheAssassin, is there a way to remove the __pycache__ cruft from an appimage?

luzpaz avatar Feb 17 '22 17:02 luzpaz

@luzpaz
Please could you correct the title, many is the correct workd "may" is somewhat earlier as we are in February :-D

onekk avatar Feb 17 '22 18:02 onekk

@luzpaz I wonder why there is any at all, maybe leftovers from the installation using pip? You can just remove them with something like find AppDir -type d -name '__pycache__' -exec rm -rf '{}' \; before building the final AppImage. I think linuxdeploy-plugin-conda cleans up such things itself, though.

TheAssassin avatar Feb 17 '22 19:02 TheAssassin

Thanks @TheAssassin
CC @looooo

luzpaz avatar Feb 17 '22 19:02 luzpaz

As far as I remember we have deleted them once, but there were some issues with doing so. But I can't remember exactly. We can try again ;)

looooo avatar Feb 18 '22 09:02 looooo

@looooo are we using the linuxdeploy-plugin-conda as mentioned by 'TheAssassin'?

luzpaz avatar Feb 18 '22 11:02 luzpaz

Big ones are for FC itself:

9,8M	./squashfs-root/usr/Mod/Fem/femexamples/meshes/__pycache__
11M	./squashfs-root/usr/Mod/Arch/__pycache__
12M	./squashfs-root/usr/Mod/Draft/__pycache__

Around 30MB only this three

Quoted from a forum:

26

If you need a permanent solution for keeping Python cache files out of your project directories:

Starting with Python 3.8 you can use the environment variable PYTHONPYCACHEPREFIX to define a cache directory for Python.

From the Python docs:

If this is set, Python will write .pyc files in a mirror directory tree at this path, instead of in pycache directories within the source tree. This is equivalent to specifying the -X pycache_prefix=PATH option.

Example

If you add the following line to your ./profile in Linux:

export PYTHONPYCACHEPREFIX="$HOME/.cache/cpython/"

Python won't create the annoying pycache directories in your project directory, instead it will put all of them under ~/.cache/cpython/

May it will be helpful during builds test of AppImages to make things more clean.

Hope it Helps

Carlo D.

onekk avatar Feb 18 '22 14:02 onekk

@looooo are we using the linuxdeploy-plugin-conda as mentioned by 'TheAssassin'?

no.

we commented out these lines some time ago: https://github.com/FreeCAD/FreeCAD-Bundle/blob/master/conda/linux/create_bundle.sh#L48

there were some issues with asm3 as far as I can remember. So will keep the pyc files for now.

looooo avatar Feb 20 '22 17:02 looooo

From some research I found that usually doing:

# Remove __pycache__ folders and .pyc files
find . -path "*/__pycache__/*" -delete

Is safe,

blindly removing all the pyc files is not so safe.

__pychache__ are created by python "on demand".

pyc files coudl be byte compiled by the user to supply a sort of "C library" for python (pyc are byte compiled file executed by the python interpreter, a sort of dll in windows terms).

So usually __pycache__ are created by Python itself when running an executable and could be safely removed.

pyc could be supplied by the developer to speed the run, and sometimes to "hide some code".

Sadly, there is no official documentation on these things, (or at least not immediatly found), but many discussion on stackexchange.

Probably trying to delete only __pycache__ files and do some testing will be the "correct thing to do"

Or asking developer of asm3 some advice.

Regards

Carlo D.

onekk avatar Feb 20 '22 17:02 onekk

Ok, let's try again: https://github.com/FreeCAD/FreeCAD-Bundle/commit/8e28d9ad26b500b5d2ebd1645ac6b00279ef358e

looooo avatar Feb 20 '22 19:02 looooo

In good ol' Python, one could also just ship .pyc or .pyo without the original .py files. Perhaps that's your issue? Some lib installs just compiled byte code?

TheAssassin avatar Feb 20 '22 22:02 TheAssassin

Problem is not this, is that AppImage is full of __pycache__ directories, that are eating more than 30MB of space, so when you download an AppImage, it will be more quick not having to download all the __pycache__ dirs.

I think that pyc files outside __pycache__ dirs could be of the type you are speaking about.

It could be a good things, to find what files are creating the problems if deleted, as not every users have a Unlimited plan for internet connections, or have a fast internet connections.

Sadly we "first world people" are sure that al the rest of the world is behaving like us.

I think that even in some area of first world, maybe were the only internet connection available is a 4G connection or a DSL line, such a strategy will be appreciated.

Other side note, being GPL FC has to supply sources and a pyc file is not human readable.

Regards

Carlo D.

onekk avatar Feb 21 '22 08:02 onekk

@onekk did pycache removal trigger any bugs for you. I don't use the AppImage as much anymore so I'm a bit out of touch. There are issues popping up like the black start page etc...

luzpaz avatar Apr 02 '22 12:04 luzpaz

Sorry I'm using a miniconda install now, probably similar to an AppImage".

But this variable in the starting script of FC will solve "Start Page" problem

export QTWEBENGINE_DISABLE_SANDBOX=1

from:

https://forum.freecadweb.org/viewtopic.php?p=585485#p585485

But it seems to be related to some "security hardening" in glibc as reported in:

https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1939993

I know that is OT here: but...

Regards

Carlo D.

onekk avatar Apr 02 '22 15:04 onekk

Thanks, can you provide a PR?

looooo avatar Apr 02 '22 18:04 looooo

Sorry but I don't know how to make a PR without downloading the whole source. (I'm short on disk space now).

Probably adding:

export QTWEBENGINE_DISABLE_SANDBOX=1

in

https://github.com/FreeCAD/FreeCAD-Bundle/blob/master/conda/linux/AppDir/AppRun

will do the trick.

But probably even the solution for conda see my answer on the forum:

https://forum.freecadweb.org/viewtopic.php?p=585715#p585715

wil do the trick in a more general way for both AppImages and other installs

Regards

Carlo D.

onekk avatar Apr 03 '22 10:04 onekk