wrye-bash icon indicating copy to clipboard operation
wrye-bash copied to clipboard

Path API

Open Utumno opened this issue 8 years ago • 6 comments

Our Path API is showing its age:

  • Instances are not that lightweight after all - it's ~~probably~~ behind the heavy use of memory by the BAIN tab
  • case insensitivity blocks linux support ( #243 )
  • the GPath cache/factory is (along with deprint, see #352) the main coupling of newest (library) code with bolt. We would like to see this eliminated - as we might as well at some point export the games.py or bsa_files.py modules as python libraries.
  • reinventing the wheel - pathlib is now part of the std - source. See also https://www.python.org/dev/peps/pep-0519/ for API changes to accept paths.

So:

  • [ ] rework for performance (memory and speed)
  • [ ] revisit cXXX properties - replace them in the related code with LowerDict APIs - cleaner, less error prone, and we erode Path so we can...
  • [ ] ...Investigate how easily we could switch to pathlib2 2.2.1 - after we finally move to python 3. The hard part will be the comparisons between strings and Paths we perform implicitly (explicit is better than implicit...)

Remove read only

https://stackoverflow.com/a/43914402/281545

Utumno avatar Mar 22 '17 20:03 Utumno

Backburner for now

Utumno avatar Mar 22 '17 20:03 Utumno

@lojack5 this is the issue for Path refactoring. A couple notes not mentioned in the OP:

  • [ ] revisit settings - we pickle Paths and that's not a good idea, we should pickle only std classes - portable, lighter etc. Especially the Installers.dat (if many loose files are present like in TESIV) takes ages to load
  • [ ] clearly differentiate between windows and abstract paths - in path lib those are called "Pure paths". Sometimes - but rarely - the "Pure windows path" is needed (like when writing the local save path to the game ini IIUC). We should make sure that we work with abstract paths internally and save abstract paths in settings/pickles - this way we can run bash with windows settings files on linux. We should then only instantiate Path when we need filesystem operations
  • [ ] re filesystem operations - we maybe do not want to wrap those after all - rather directly using os.path (that accepts path objects). Explicit is better than implicit, plus most of the wrapping Path does should be solved in other ways in py3 (for instance Path.open encoding is not needed any more in Py3)

Utumno avatar Dec 08 '19 16:12 Utumno

Yep I'm fairly familiar with pathlibs stuff, and its weird edge cases (like subclassing....wtf). At least for now my plans are:

  • [ ] Rename bolt.Path methods that have direct replacements in pathlib with those, performance be damned.
  • [ ] Any non-pathlib functionality in ours that aren't easily replaced, make a (small) module that takes PathLike objects, rather than attaching those functionalities directly onto bolt.Path. For example, bolt.Path.walk can be replaced by os.walk in Python 3 (as you've mentioned), but for an intermediate before the big changeover, will need to provide a bolt.walk (or at least something.walk, can converge on naming later), that acts like Python 3's os.walk, ie: on Python <3.6 call str(obj), and on 3.6+ call os.fspath(obj) on the argument before processing.

End goal is basically, the bolt.Path API is a subset of pathlib.Path, so it can be seamlessly swapped out when the time comes. Any additional utility functions will work with objects implementing the PathLike interface, so it won't matter if they're the old class, the std class, or just plain strings.

Once those goals are complete, then, and only then we can start looking at performance issues (which there will be a lot, especially in BAIN). Most of those will probably be solved with caching. Keep in mind we're giving up memory performance in exchange for time performance on this one (ie: BAIN refresh time vs BAIN memory usage). Really BAIN is place where bolt.Path performance is the bottleneck. Bashed Patch building doesn't use enough instances for it to matter as much as BAIN.

lojack5 avatar Dec 09 '19 01:12 lojack5

Well turns out the pathlib.Path and Path APIs differ a lot (edit 2023.06.06 as we are on py3 and the temp stuff was removed from Path - yey!):

dir(Path)
['__abstractmethods__', '__add__', '__annotations__', '__class__', 
'__class_getitem__', '__copy__', '__deepcopy__', '__delattr__', 
'__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__fspath__', 
'__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', 
'__init__', '__init_subclass__', '__le__', '__len__', '__lt__', 
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__', 
'__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_cext', 
'_cs', '_ext', '_getmtime', '_hash', '_onerror', '_s', '_sbody', 
'_setmtime', '_shead', '_sroot', '_stail', 'atime', 'backup', 'body', 
'cext', 'clearRO', 'copyTo', 'crc', 'cs', 'ctime', 'drive', 'editable', 
'exists', 'ext', 'getNorm', 'getcwd', 'has_invalid_chars', 'head', 
'headTail', 'ilist', 'invalid_chars_re', 'is_absolute', 'is_dir', 
'is_file', 'join', 'makedirs', 'moveTo', 'mtime', 'open', 'psize', 
'relpath', 'remove', 'removedirs', 'replace_with_temp', 'rmtree', 
'root', 's', 'sbody', 'setcwd', 'shead', 'size_mtime', 
'size_mtime_ctime', 'sroot', 'stail', 'start', 'stat', 
'sys_fs_enc', 'tail', 'walk']
sorted((dp := set(dir(Path))) - (dplb := set(dir(_Path))))
['__abstractmethods__', '__add__', '__annotations__', 
'__class_getitem__', '__copy__', '__deepcopy__', '__dict__', '__len__', 
'__setstate__', '__weakref__', '_abc_impl', '_cext', '_cs', '_ext', 
'_getmtime', '_onerror', '_s', '_sbody', '_setmtime', '_shead', 
'_sroot', '_stail', 'atime',
'backup',
'body', 'cext', 'clearRO', 
'copyTo', 'crc', 'cs',
'ext', 'getNorm', 'getcwd', 
'has_invalid_chars', 'head', 'headTail', 'ilist', 'invalid_chars_re', 
'join', 'makedirs', 'moveTo', 'mtime', 'psize', 'relpath', 'remove', 
'removedirs', 'replace_with_temp', 'rmtree', 's', 'sbody',
'shead', 'size_mtime', 'size_mtime_ctime', 'sroot', 'stail', 'start', 
'sys_fs_enc', 'tail']
sorted(dplb - dp)
['__bytes__', '__enter__', '__exit__', '__rtruediv__', '__truediv__', 
'_cached_cparts', '_cparts', '_drv', '_format_parsed_parts', 
'_from_parsed_parts', '_from_parts', '_make_child', 
'_make_child_relpath', '_parse_args', '_parts', '_pparts', '_root', 
'_scandir', '_str', 'absolute', 'anchor', 'as_posix', 'as_uri', 
'chmod', 'cwd', 'expanduser', 'glob', 'group', 'hardlink_to', 'home', 
'is_block_device', 'is_char_device', 'is_fifo', 'is_mount', 
'is_relative_to', 'is_reserved', 'is_socket', 'is_symlink', 'iterdir', 
'joinpath', 'lchmod', 'link_to', 'lstat', 'match', 'mkdir', 'name', 
'owner', 'parent', 'parents', 'parts', 'read_bytes', 'read_text', 
'readlink', 'relative_to', 'rename', 'replace', 'resolve', 'rglob', 
'rmdir', 'samefile', 'stem', 'suffix', 'suffixes', 'symlink_to', 
'touch', 'unlink', 'with_name', 'with_stem', 'with_suffix', 
'write_bytes', 'write_text']

Utumno avatar Nov 21 '21 17:11 Utumno

For future reference, cause I just ran into this with my own project, this is how you subclass a pathlib path to extend it:

from pathlib import Path

class MyPath(type(Path())): # ugh
    # extensions go here

The Path APIs all seem to work correctly like this, giving instances of your subclass back:

>>> from paths import MyPath
>>> list(MyPath('test').iterdir())
[MyPath('test/test1.py'), MyPath('test/test2.py'), MyPath('test/test3.py'), MyPath('test/__init__.py')]

Infernio avatar Dec 10 '21 18:12 Infernio

Wowz - yes was getting there eventually but let's merge the rest first - with FName the landscape will clear up a lot

Utumno avatar Dec 10 '21 19:12 Utumno