TransformerEngine
TransformerEngine copied to clipboard
Fix runtime lib loading logic
Description
This is a small refactor of library loading logic during runtime to be more consistent and avoid duplication. The main point is to check python packages as a last ditch attempt to find the library and prioritize system installations.
Fixes a bug where the incorrect shared object is loaded (with mismatching versions) due to presence of PyPI packages that are installed by pytorch/jax etc.
Type of change
- [ ] Documentation change (change only to the documentation, either a fix or a new content)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] Infra/Build change
- [x] Code refactoring
Changes
- Remove duplication of loading logic for various libs such as
curand,cudnnetc. - Prioritize loading packages via system, e.g.
LD_LIBRARY_PATHbefore checking python packages. - Remove search via
ldconfigas redundant and brute force.
Checklist:
- [x] I have read and followed the contributing guidelines
- [x] The functionality is complete
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
/te-ci