sssom-py icon indicating copy to clipboard operation
sssom-py copied to clipboard

Delete all validation files since they may be old

Open hrshdhgd opened this issue 3 years ago • 3 comments

Fixes #317

hrshdhgd avatar Sep 25 '22 21:09 hrshdhgd

All of them? Oh wow, we must have not tested roundtripping for ages!

See https://github.com/mapping-commons/sssom-py/blob/validate-data-files/tests/test_conversion.py#L68

Can you reintroduce some of these checks? I felt quite protected having them there!

matentzn avatar Sep 26 '22 08:09 matentzn

Ok, so when I put in created files in the validate_data folder and run the validation check, it still fails (literally the same file copy-pasted). I dug deeper and the root cause is in the filecmp.cmp() function.

    s1 = _sig(os.stat(f1))
    s2 = _sig(os.stat(f2))
    if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
        return False
    if shallow and s1 == s2:
        return True
    if s1[1] != s2[1]:
        return False

Apparently s1 is not == s2.

s1 = (32768, 98892, 1664213293.0535548)
s2 = (32768, 98892, 1664212433.827459)

os.stat(f1) = os.stat_result(st_mode=33152, st_ino=16032794, st_dev=16777225, st_nlink=1, st_uid=502, st_gid=20, st_size=98892, st_atime=1664213294, st_mtime=1664213293, st_ctime=1664213293)

os.stat(f2) = os.stat_result(st_mode=33152, st_ino=49568904, st_dev=16777225, st_nlink=1, st_uid=502, st_gid=20, st_size=98892, st_atime=1664213022, st_mtime=1664212433, st_ctime=1664213021)

So the st_atime does not match and hence the files are determined to be not equal. This is the last access time.

stat.ST_ATIME
Time of last access.

The documentation says: "If shallow is true and the os.stat() signatures (file type, size, and modification time) of both files are identical, the files are taken to be equal."

modification time will never match in our case. Would this be the correct way of checking files? Should we just write a different test to test these files?

Update:

  • I implemented checks on just file type and size (dropping modification time) for now. Ideally shallow = False should do the same but it seems every time a file is generated , it seems that some of the lines get shuffled as compared the to prior time out was generated. We can discuss further in out meeting.
  • Windows tests fail

hrshdhgd avatar Sep 26 '22 17:09 hrshdhgd

@hrshdhgd can you update this PR?

matentzn avatar Feb 04 '24 10:02 matentzn