Unit tests failing
Commit 7b66fa1 (ping @chlunde) broke the unit test suite:
$ python setup.py test
running test
running egg_info
writing hardlinkpy.egg-info/PKG-INFO
writing top-level names to hardlinkpy.egg-info/top_level.txt
writing dependency_links to hardlinkpy.egg-info/dependency_links.txt
writing entry points to hardlinkpy.egg-info/entry_points.txt
reading manifest file 'hardlinkpy.egg-info/SOURCES.txt'
writing manifest file 'hardlinkpy.egg-info/SOURCES.txt'
running build_ext
test_hardlink_tree (tests.TestHappy) ... ok
test_hardlink_tree_dryrun (tests.TestHappy) ... ok
test_hardlink_tree_exclude (tests.TestHappy) ... ok
test_hardlink_tree_filenames_equal (tests.TestHappy) ... FAIL
test_hardlink_tree_timestamp_ignore (tests.TestHappy) ... ok
======================================================================
FAIL: test_hardlink_tree_filenames_equal (tests.TestHappy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/root/home/kaiant/repos/python/hardlinkpy/tests.py", line 105, in test_hardlink_tree_filenames_equal
self.assertEqual(get_inode("dir1/name1.ext"), get_inode("dir2/name1.ext"))
AssertionError: 771490 != 771491
----------------------------------------------------------------------
Ran 5 tests in 0.008s
FAILED (failures=1)
Which OS/filesystem/python-version is this? I can't reproduce it
- Fedora 26
- btrfs
- Python 2.7.14
Yes, the algorithm when the "filenames-equal" option is enabled, is order dependent, and so the tests can fail depending on the order that the OS returns the filenames when iterating over a directory, if some of those files are already hardlinked.
For example, the dir1/name1.ext and dir1/link files are hardlinked before the test begins. If the files are returned in the order "dir2/name1.ext", "dir1/link", "dir1/name1.ext", then the test fails because "dir1/name1.ext" won't be linked to "dir2/name1.ext", as it is already linked to "dir1/link". However, if the order is "dir1/name1.ext", "dir1/link", "dir2/name1.ext", then "dir2/name1.ext" will be linked to "dir1/name1.ext", because "dir2/name1.ext" is not already linked to a previously seen file (such as "dir1/link").
It should be possible to ensure that when the "filenames-equal" flag is set, the program doesn't abort the search early when it finds a file that it is already linked to (unless perhaps it has the same basename). I'll work on a solution.
Would it make sense to normalize directory content iteration to always happen in alphabetically sorted order?
@akaihola Well, sorting directory iteration might at least make the tests work consistently, but there are other problems with the existing algorithm (which I discussed in my other reply) that mean that not all identical files remain hardlinked together. I think if we solve that issue (which is alluded to in the hardlink.py TODO), it will also fix the tests.