Path API
Our Path API is showing its age:
- Instances are not that lightweight after all - it's ~~probably~~ behind the heavy use of memory by the BAIN tab
- case insensitivity blocks linux support ( #243 )
- the GPath cache/factory is (along with deprint, see #352) the main coupling of newest (library) code with bolt. We would like to see this eliminated - as we might as well at some point export the games.py or bsa_files.py modules as python libraries.
- reinventing the wheel - pathlib is now part of the std - source. See also https://www.python.org/dev/peps/pep-0519/ for API changes to accept paths.
So:
- [ ] rework for performance (memory and speed)
- [ ] revisit cXXX properties - replace them in the related code with LowerDict APIs - cleaner, less error prone, and we erode Path so we can...
- [ ] ...Investigate how easily we could switch to pathlib2 2.2.1 - after we finally move to python 3. The hard part will be the comparisons between strings and Paths we perform implicitly (explicit is better than implicit...)
Remove read only
https://stackoverflow.com/a/43914402/281545
Backburner for now
@lojack5 this is the issue for Path refactoring. A couple notes not mentioned in the OP:
- [ ] revisit settings - we pickle Paths and that's not a good idea, we should pickle only std classes - portable, lighter etc. Especially the Installers.dat (if many loose files are present like in TESIV) takes ages to load
- [ ] clearly differentiate between windows and abstract paths - in path lib those are called "Pure paths". Sometimes - but rarely - the "Pure windows path" is needed (like when writing the local save path to the game ini IIUC). We should make sure that we work with abstract paths internally and save abstract paths in settings/pickles - this way we can run bash with windows settings files on linux. We should then only instantiate Path when we need filesystem operations
- [ ] re filesystem operations - we maybe do not want to wrap those after all - rather directly using os.path (that accepts path objects). Explicit is better than implicit, plus most of the wrapping Path does should be solved in other ways in py3 (for instance Path.open encoding is not needed any more in Py3)
Yep I'm fairly familiar with pathlibs stuff, and its weird edge cases (like subclassing....wtf). At least for now my plans are:
- [ ] Rename
bolt.Pathmethods that have direct replacements in pathlib with those, performance be damned. - [ ] Any non-pathlib functionality in ours that aren't easily replaced, make a (small) module that takes
PathLikeobjects, rather than attaching those functionalities directly ontobolt.Path. For example,bolt.Path.walkcan be replaced byos.walkin Python 3 (as you've mentioned), but for an intermediate before the big changeover, will need to provide abolt.walk(or at leastsomething.walk, can converge on naming later), that acts like Python 3'sos.walk, ie: on Python <3.6 callstr(obj), and on 3.6+ callos.fspath(obj)on the argument before processing.
End goal is basically, the bolt.Path API is a subset of pathlib.Path, so it can be seamlessly swapped out when the time comes. Any additional utility functions will work with objects implementing the
PathLike interface, so it won't matter if they're the old class, the std class, or just plain strings.
Once those goals are complete, then, and only then we can start looking at performance issues (which there will be a lot, especially in BAIN). Most of those will probably be solved with caching. Keep in mind we're giving up memory performance in exchange for time performance on this one (ie: BAIN refresh time vs BAIN memory usage). Really BAIN is place where bolt.Path performance is the bottleneck. Bashed Patch building doesn't use enough instances for it to matter as much as BAIN.
Well turns out the pathlib.Path and Path APIs differ a lot (edit 2023.06.06 as we are on py3 and the temp stuff was removed from Path - yey!):
dir(Path)
['__abstractmethods__', '__add__', '__annotations__', '__class__',
'__class_getitem__', '__copy__', '__deepcopy__', '__delattr__',
'__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__fspath__',
'__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__',
'__init__', '__init_subclass__', '__le__', '__len__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__',
'__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_cext',
'_cs', '_ext', '_getmtime', '_hash', '_onerror', '_s', '_sbody',
'_setmtime', '_shead', '_sroot', '_stail', 'atime', 'backup', 'body',
'cext', 'clearRO', 'copyTo', 'crc', 'cs', 'ctime', 'drive', 'editable',
'exists', 'ext', 'getNorm', 'getcwd', 'has_invalid_chars', 'head',
'headTail', 'ilist', 'invalid_chars_re', 'is_absolute', 'is_dir',
'is_file', 'join', 'makedirs', 'moveTo', 'mtime', 'open', 'psize',
'relpath', 'remove', 'removedirs', 'replace_with_temp', 'rmtree',
'root', 's', 'sbody', 'setcwd', 'shead', 'size_mtime',
'size_mtime_ctime', 'sroot', 'stail', 'start', 'stat',
'sys_fs_enc', 'tail', 'walk']
sorted((dp := set(dir(Path))) - (dplb := set(dir(_Path))))
['__abstractmethods__', '__add__', '__annotations__',
'__class_getitem__', '__copy__', '__deepcopy__', '__dict__', '__len__',
'__setstate__', '__weakref__', '_abc_impl', '_cext', '_cs', '_ext',
'_getmtime', '_onerror', '_s', '_sbody', '_setmtime', '_shead',
'_sroot', '_stail', 'atime',
'backup',
'body', 'cext', 'clearRO',
'copyTo', 'crc', 'cs',
'ext', 'getNorm', 'getcwd',
'has_invalid_chars', 'head', 'headTail', 'ilist', 'invalid_chars_re',
'join', 'makedirs', 'moveTo', 'mtime', 'psize', 'relpath', 'remove',
'removedirs', 'replace_with_temp', 'rmtree', 's', 'sbody',
'shead', 'size_mtime', 'size_mtime_ctime', 'sroot', 'stail', 'start',
'sys_fs_enc', 'tail']
sorted(dplb - dp)
['__bytes__', '__enter__', '__exit__', '__rtruediv__', '__truediv__',
'_cached_cparts', '_cparts', '_drv', '_format_parsed_parts',
'_from_parsed_parts', '_from_parts', '_make_child',
'_make_child_relpath', '_parse_args', '_parts', '_pparts', '_root',
'_scandir', '_str', 'absolute', 'anchor', 'as_posix', 'as_uri',
'chmod', 'cwd', 'expanduser', 'glob', 'group', 'hardlink_to', 'home',
'is_block_device', 'is_char_device', 'is_fifo', 'is_mount',
'is_relative_to', 'is_reserved', 'is_socket', 'is_symlink', 'iterdir',
'joinpath', 'lchmod', 'link_to', 'lstat', 'match', 'mkdir', 'name',
'owner', 'parent', 'parents', 'parts', 'read_bytes', 'read_text',
'readlink', 'relative_to', 'rename', 'replace', 'resolve', 'rglob',
'rmdir', 'samefile', 'stem', 'suffix', 'suffixes', 'symlink_to',
'touch', 'unlink', 'with_name', 'with_stem', 'with_suffix',
'write_bytes', 'write_text']
For future reference, cause I just ran into this with my own project, this is how you subclass a pathlib path to extend it:
from pathlib import Path
class MyPath(type(Path())): # ugh
# extensions go here
The Path APIs all seem to work correctly like this, giving instances of your subclass back:
>>> from paths import MyPath
>>> list(MyPath('test').iterdir())
[MyPath('test/test1.py'), MyPath('test/test2.py'), MyPath('test/test3.py'), MyPath('test/__init__.py')]
Wowz - yes was getting there eventually but let's merge the rest first - with FName the landscape will clear up a lot