deltacode
deltacode copied to clipboard
Failing case if extra directory is added
I did some simple tests and here is my finding:
I use "balloontip-1.1.1.jar" as a sample file.
-
Created 2 directories
d1/andd2/and put the test file in it and then compare these 2 directories. The output is unchanged which is correct. -
Same setup as (1) but create a new subdirectory named
test/underd1/and put balloontip-1.1.1.jar in it. Both thed1/balloontip-1.1.1.jard1/test/balloontip-1.1.1.jarare returned as added.
and the d2/balloontip-1.1.1.jar is returned as removed
which is not correct as the d1/balloontip-1.1.1.jar and d2/balloontip-1.1.1.jar should return unchanged while the d1/test/balloontip-1.1.1.jar is consider as added.
- Same setup as (1) but create a new
root/directory and put thed1/in it and run the deltacode fromroot/tod2/. The output is unchanged which is correct.
The functionality you are looking for is relating to another ticket, https://github.com/nexB/deltacode/issues/4
Since you’ve nested the jar file one level lower the second time, deltacode infers this as a removal followed by an addition, when in reality that file has just been moved to another location.
I’ll keep this ticket open for reference and close it when #4 gets taken care of.
@majurg not exactly the same issue.
For instance,
In (1),
I have
/d1/balloontip-1.1.1.jar
/d2/balloontip-1.1.1.jar
Then, the deltacode return d1 and d2 is the same which is correct.
in case (2)
I have
/d1/balloontip-1.1.1.jar
/d1/test/balloontip-1.1.1.jar
/d2/balloontip-1.1.1.jar
which I expect the deltacode will tell me the /d1/balloontip-1.1.1.jar and /d2/balloontip-1.1.1.jar are the same and /d1/test/balloontip-1.1.1.jar is new.
However, the tool tells me BOTH
/d1/balloontip-1.1.1.jar
/d1/test/balloontip-1.1.1.jar
are added
and
/d2/balloontip-1.1.1.jar
is removed.
So, my question is why adding a new directory make the "suppose to be the same" originally become "added/removed"
@majurg Here's what we have after scanning and running DeltaCode on these 3 pairs of test codebases. The results don't seem to be entirely consistent. Putting aside the fact that we currently treat files as moved only when there's a single identical added and removed file, I think the inconsistent treatment arises at least in part from the way we remove path segments during the fix_trees()/align_trees() process.
- Rename the directory from
d1tod2.
DeltaCode treats this as unmodified.
{
"deltacode_version": "0.0.1.beta",
"deltacode_stats": {
"added": 0,
"modified": 0,
"moved": 0,
"removed": 0,
"unmodified": 1
},
"deltas": [
{
"category": "unmodified",
"path": "balloontip-1.1.1.jar",
"name": "balloontip-1.1.1.jar",
"type": "file",
"size": 53842
}
]
}
- Add a
testsubdirectory tod1, also containingballoontip-1.1.1.jar.
DeltaCode treats this as 2 removed, 1 added.
{
"deltacode_version": "0.0.1.beta",
"deltacode_stats": {
"added": 1,
"modified": 0,
"moved": 0,
"removed": 2,
"unmodified": 0
},
"deltas": [
{
"category": "added",
"path": "balloontip_new_test_subdirectory_new/d2/balloontip-1.1.1.jar",
"name": "balloontip-1.1.1.jar",
"type": "file",
"size": 53842
},
{
"category": "removed",
"path": "balloontip_new_test_subdirectory_old/d1/balloontip-1.1.1.jar",
"name": "balloontip-1.1.1.jar",
"type": "file",
"size": 53842
},
{
"category": "removed",
"path": "balloontip_new_test_subdirectory_old/d1/test/balloontip-1.1.1.jar",
"name": "balloontip-1.1.1.jar",
"type": "file",
"size": 53842
}
]
}
- Add a
rootdirectory aboved1.
DeltaCode treats this as unmodified.
{
"deltacode_version": "0.0.1.beta",
"deltacode_stats": {
"added": 0,
"modified": 0,
"moved": 0,
"removed": 0,
"unmodified": 1
},
"deltas": [
{
"category": "unmodified",
"path": "balloontip-1.1.1.jar",
"name": "balloontip-1.1.1.jar",
"type": "file",
"size": 53842
}
]
}
@chinyeungli @johnmhoran @majurg how about we ignore the align scans for now, after the integration of VirtualCodebse? Maybe we could follow up in a separate branch apart from the main branch.
Let's consider a directory structure as :
New directory
Old Directory
Now if the a1.py is having the same sha1 we are treating them as unchanged files, we completely ignore that their main directories are different. Mainly owing to (alignscans / fix trees).
But instead of that what if we do not allow changing the main root directory?
Now what I propose is we should also treat them as per their main root directory (not just their sub dir).
If we do so we do not need the extra burden of aligning the scans .
The scans will be aligned as they are loaded from the Virtual Codebase as a resource objects.
We can safely ignore all aligning.
Now for a file to have the status as unmodified it must have the same path(full path) along with the same sha1.
for a file of status moved it must exist in some other subdirs in new_scan along with that it must have the same sha1.
And so on for other status ....
And also the codebase would be a lot cleaner than now
@chinyeungli @johnmhoran @majurg need your views upon this
@Pratikrocks I agree with removing/ignoring alignment for the first implementation of adding virtualcodebase.
Okay