FLAMEGPU2
FLAMEGPU2 copied to clipboard
Conda Python Package Distribution
Conda is an alternative python package distribution mechanism to pip/pypi.
It is not as widely use due to pip
and pypi being the defacto defaults, but is widely used in the scientific community.
The main advantage of conda over pip is that it can manage non python dependencies, such as the cuda toolkit.
Several large/populat CUDA + python packages prefer conda as their package distribution mechanism to some of the features it offers comapred to pypi, including rapids and torch.
Conda allows packages to depend on nvidia provided cudatoolkit
packages, with versions specified, unlike pypi, allowing users to requrest the latest version of a package, built with CUDA X.Y which is not possible using pypi + local verison +cudaXY labels.
The conda EULA however should also be considered, as this was changed in 2020? to be less favourable than before, but in practice this shouldn't be a issue.
I'm not yet famililar enough with the conda packaging process to weigh up on how much effort this would be, without cmake/swig based setup, as most examples are from a mostly pure python packaging, but things to consider would be:
- Conda binary distribution appears to just be compressed tarballs. How does this deal with glibc? We may still have to build on centos7 or equivalent docker images? The
conda convert
command may be relevant. - Are conda labels suitable for providing alternative seatbelts on/off builds? The same woudl apply for visuslistaion
- This wills till rely on not redistributing
libcuda.so
, so (atleast some of) #647 will still be required. - Upload to conda would be possible via actions on successful release creation
See #605 for some past notes/discussion
Conda packages do not have an equivanelt to the extras
/extras_require
part of pypi/wheel packages which are being considered for optional dependencies. There are open issues on the conda repo to add this feature, but the general consensus so far is to use conda meta packages.
A conda metapackage is a package which has no files, only metadata (i.e. depends on packages x and y). I.e. we could have the core pyflamegpu
pacakge, and a pyflamegpu-visualisation
metapackage which depends on the core pyflamegpu
and the visualisation compontent? I don't fully understand this yet, but it's worth making a note of.
Package Names and filenames
- Conda package names can contain
[a-z0-9_\.\-]
.- I.e.
pyflamegpu
andpyflamegpu-vis
orpyflamegpu_vis
would be ok. if we have more than one.
- I.e.
- Package versions are much more free-form than wheel packages, so our current plans are OK.
- Build strings identifies specific builds of a version of a package. I.e. which platform / conda version it was built with. It's dynamicly generatetd during the build process.
- conda package filenames are
<package_name>-<version>-<build>.(tar.bz2|conda)
.- the newer
.conda
format is much faster to extract, and typically is smaller than the older.tar.bz2
pacakge format.
- the newer
- Example conda package filenames
-
linux-64/pytorch-1.9.0-py3.8_cuda11.1_cudnn8.0.5_0.tar.bz2
- Linux builds of pytorch1.9.0, with the build string identifying the python version, cuda version and cudnn dependencies.
- This would be installable via
conda install pytorch cudatoolkit=11.1 -c pytorch -c nvidia
.- The nvidia channel is required for 11.1 but not 10.2 in the intstall instructions, not sure why exactly.
-
win-64/pytorch-1.9.0-py3.8_cuda10.2_cudnn7_0.tar.bz2
- Windows build of pytorch 1.9.0, build string showing the python, cuda and cudnn versions.
- This would be installable via
conda install pytorch cudatoolkit=10.2 -c pytorch
from windows
-
win-64/pytorch-1.9.0-py3.6_cpu_0.tar.bz2
- Windows build of the CPU variant of pytorch, shown by the build string.
- This is installed via
conda install pytorch cpuonly -c pytorch
from a wnidows platform.-
cpuonly
here is a metapackage that influences which version of pytorch is installed. Thetrack_features
component ofmetadata.yaml
looks relevant.
-
-
Conda Channels.
- Conda channels are URLs which are used as sources of lists of the packages available.
- Being more specific about which channels to look in generally improves performance of conda install (as there are less packages to consider)
- They can be specified excplicitly, or if it's an ancaonda.org channel (or other default?) just by using the name. I.e.
conda install pytorch cudatoolkit=10.2 -c pytorch
will install thepytorch
package from thepytorch
channel, and depends on version10.2
of the cudatoolkit package (which is provided by thenvidia
channel.- Specifying the channel can be ommitted most of the time, but its faster to specify it, and if your package and channel have the same names then you may have to specify the channel too.
- I.e. We would have the
flamegpu
orpyflamegpu
channels. Probablyflamegpu
in case we provide bindings for R which could be distributed through conda. - Within a channel, packages are grouped in directories by platform. these appear to commonly just be
linux-64
,win-64
andosx-64
.noarch
is used for platform independent (not-binary) pacakges.- I.e. we can't really use gh releases as a way of providing our own index, but I don't think we'd need to.
Labels
-
The default
label
ismain
-
Labels can be used to separate builds at different stages of development
-
When packages are uploaded to an index, they can be assigned one or more labels.
-
When users install packages from an index, they will only access the
main
(or only?) label by default. -
If there were a
test
lablel users would use-c channel/label/test
when searching / installing rather than-c channel
. -
This might be how to distribute
alpha
/beta
/rc
packages with conda, possibly aprerelease
label for all of them, although I'm unsure how the conda's version comparison rules would decide which is latest if an exact version was not provided.- Conda-forge use:
- the
rc
label forbeta
andrc
releases, used when there are no new features to be added to that release, just bug-fixes. - the
dev
label forpre-alpha
andalpha
builds, i.e. for versions that will have significant changes prior to a stable (or beta/rc) release. -
conda-forge is a communitey channel which provides a central location to find many common packages in a single conda channel, rather than having to specify lots of
-c <channel>
options. I dont' expect that we will be submitting to conda-forge, at least not initially.
- the
- Conda-forge use:
-
Alternatively, pytorch use a separate channel for nightly builds, but this doesn't fit our current needs.
Package Metadata
These are just some key points while scanning the referecned source. Not comprehensive.
- the
source
section needs to refer to thegit_rev
and /or include hashes of a tarball download. This might mean that we have to geenrate this in a post-release step (i.e. we can't include this information in a file in the repository). It's a bit chicken-and-egg-ish. Though performing the builds as part of the on tag push CI would have this information avaialble (assuming it works so we don't have to force push over it). Probably fine, just needs some thought. -
track_features
is a way of de-prioritising packages when there are multiple compeating version, and no two pacakges in a subdirecotry should have the same track_feature. I.e. this is how the CPU version of pytorch is de-prioritised compared to the gpu variants (if the cuda-toolkit dependency is met?) -
build: no_link:
is used to enforce copying not linking of files. -
build: noarch
is used for architecture independent packages. This would only be relevant for metapackages if we use them. -
requirements
of packages are split intobuild
,host
andrun
requirements, with complex rules about how abuild
requirement may implicitly addhost
andrun
requirements.-
git
andcmake
would be some of the entries inbuild:
in our case. - shared libraries should be listed in the
host
section rather thanbuild
, for portable packages?
-
-
test:
section provides details of how ot test the package, including dependencies, commands and python imports required. Alternatively a script can be referenced to handle this. - This can be templateded with Jinja templates, which will resolve some concerns sucha s the git tag etc. I.e.
version: {{ GIT_DESCRIBE_TAG }}
Other conda packages
- There's a swig conda package, with 4.0.2 already, so we can depend on that at conda time rather than our cmake fetching it.
- rapids/cudf might not be a bad reference for cuda library + python wrapper. It provides a
cudf
package which is the python interface, which depends onlibcudf
(plus alternate builds with optional extras such as kafka
Possible files to create
conda/
└── recipes
├── libflamegpu
│ ├── bld.bat
│ ├── build.sh
│ └── meta.yaml
└── pyflamegpu
├── bld.bat
├── build.sh
└── meta.yaml
And maybe visualisation variants of the above, and/or console versions too, depending on how splitting that goes.
The pyflamegpu package metadata would depend on libflamepgu metadata (if we package both on conda). Alternatively we could just provide pyflamegpu if we do not dist libflamegpu separately.
Conda is looking more viable / likely to be our chosen method of distribution, as there are GL related conda packages we might be able to leverage to avoid redistibuting ourselves, if that's even required for conda binary distribution.
Pip/pypi is looking less and less viable (for visuaslisation distribution)
If we are to make a conda package, we will need to tweak the dll loading logic for windows + python 3.8+ when it is patched to use add_dll_directory
when finding nvrtc etc, as conda doesn't do the same as cpython.
The Conda forge maintainers documentation has some very useful info about making packages. I doubt we'll push to conda-forge immediately, instead using our own channel, but most of this still applies and doesn't look too bad.
https://conda-forge.org/docs/maintainer/knowledge_base.html
CUDA 12.x needs handling differently to 11.x it seems, in a good way (can specify the parts of cuda needed, so installs are much lighter).
https://github.com/conda-forge/cuda-version-feedstock/blob/main/recipe/README.md
Rapids/pytorch are probably a good source of how to deal with that.
If we are to make a conda package, we will need to tweak the dll loading logic for windows + python 3.8+ when it is patched to use
add_dll_directory
when finding nvrtc etc, as conda doesn't do the same as cpython.
https://docs.conda.io/projects/conda/en/latest/user-guide/troubleshooting.html#the-system-cannot-find-the-path-specified-on-windows
Best evidence I can find, is that we wouldn't require that data.