RoboSharp
RoboSharp copied to clipboard
RoboSharp.Extensions Namespace to provide for custom IRoboCommand Implementations
Various issues fixed for WPF .net5 compatibility
Hi Robert, happy to merge this in for you if it is ready to go?
Not really close at all. I'm still setting my application up. Once I get it working, I'll double check all this with the test app. May be a while.
Oh man, this is looking to be another sweeping round of changes, mostly exposing things that were internal.
Essentially in my application I am pulling from about 5 directories, but also pulling in what is effectively randomly files (due to user selection process) from various folders as well.
Building RoboCommands for each single file is obnoxious, so I'm creating a new IRoboCommabd using CopyFileEx from the ground up. Robocopy also sucks to use for simple file moving on the same drive, since File.Move is near instant, robocopy does a copy then delete. So I'm optimizing that into my custom command.
Essentially writing robocopy from the ground up is something I never thought I'd do, but here I am.
So these sweeping changes will basically expose and simplify implementation of custom IRoboCommands by allowing interaction to some of the currently private constructors. (Without these changes, implementing the interface is basically impossible since you can't do progress estimation or event generation)
Hi Robert, I had noticed you had been busy making changes again!, meant to comment the other day but just got caught up with work and didn't get round to it. Will keep following this PR. If you need me to test anything or merge anything just give me a shout :)
Alright, HUGE PROGRESS.
This latest set of commits introduces the 'RoboSharp.ConsumerHelp' namespace, which contains extension methods for COpyOptions, SelectionOptions, and IRoboCommand.
It also has an interface to specify a source/destination file or folder, and accompanying helper objects that implement the tiny interfaces. This was more so if a consumer builds their own object with that information (like I already did), that interface can be thrown onto it and it now works with this set of extension methods nicely.
There is a new class designed to be a 'cache' that a consumer can instantiate at the start of their run in a custom implementation. When it is checking files, it will do all the evaluation from the selected options to decide to copy/skip the file, and also generate a ProcessedFileInfo object that is compatible with ProgressEstimator, since it uses the same strings.
As far as testing this goes, I created a test class that is a custom implementation of IRoboCommand that can be used for the unit tests. If you could write some that evaluate the various settings and results, that'd be great. Essentially, the file filters, copy options, etc should be customized within the unit tests to copy/skip certain files within our 'Source' folder.
The evaluator should then catch these conditions, and either skip/copy them accordingly. Its the evaluator that needs testing really. This can also be done by setting up a condition on the Selection options, such as a filename, and evlauating against that using 'Evaluator.ShouldCopy()' method
Hi, just downloaded PR to take a look, can you give me an example of how it would be used so I can get my head around it, then I can see what I can come up with re adding some tests. I am going away at the weekend but will have laptop with me so if I have time I will take a look then.
In the unit test project I have already created a custom IRoboCommand implementation that should iterate over the test files and copy them to the destination.
This custom implementation uses the new helper objects and interfaces, and extension methods. I have another commit in progress to clean up the code before testing. But it will iterate over the directory, then choose if it should copy Or not. I don't think results are set up in that though.
Basically, tests are desired for all the various selection options, most are straight forward, except for EscludeNewer/older.
If you don't get to it no worries, as I'm going to be using these methods in my application which will be live testing
Okay, I'll see what I can do when I have a spare moment - from what I have seen so far though, you have done some fantastic work here :)
I'm creating a new repository with my custom implementation, where I essentially rewrote robocopy from scratch for improved performance over a vpn (to avoid scanning the network twice). These changes facilitate that new implementation. I expect to be publishing and testing it later this week.
In it I'll have unit tests that basically validate this PR
Quick reply as on my phone. Can't believe how much work you have been doing, so many changes!! can't wait to test it all out :)
Yea lots of changes due to actually having a need to write a custom iRoboCommand, and realized there was no way to accurately count or raise events without rewriting everything from scratch. So I created the extensions to be able to check against the stuff, and made the internal things public to allow implementing it.
My (essentially a rewrite of robocopy from the ground up) should be much faster.
Moves on the same drive will now simply move the file instead of copy then delete.
Performing a list only to get a count will cache the evaluations, meaning the entire tree doesn't have to be evaluated twice for a list then run. So for something like a network drive accessed over a slow vpn, it should be much quicker since it will effectively only do the comparison once if you do a list-only then a copy. (Whereas doing that with robocopy requires it start from scratch because it's two separate processes that get started)
My unit tests will be running a RoboCommand and comparing its results to my CachedRoboCommand.
Performing a list only to get a count will cache the evaluations, meaning the entire tree doesn't have to be evaluated twice for a list then run. So for something like a network drive accessed over a slow vpn, it should be much quicker since it will effectively only do the comparison once if you do a list-only then a copy. (Whereas doing that with robocopy requires it start from scratch because it's two separate processes that get started)
It all sounds brilliant, but the above especially interests me as having spent a while lately doing a lot of server data migrations it is indeed a pain having to effectively run it twice to firstly get the list-only and then do a copy, so even without a VPN or slow connection, but just when migrating data from one remote site to another this should be really useful.
@PCAssistSoftware Here is my project, I'm happy to say its mostly working! https://github.com/RFBCodeWorks/CachedRoboCopy This project relies 100% on all the changes in this PR to work. The unit test I created is running RoboCopy (via robosharp), then applying the exact options to my custom command, and running it again, then comparing the two results.
My brief testing so far shows margin-of-error improvements in list-only operations, but consistent improvements if doing a list-only followed immediately by a run. That, and It included optimization for moving files on the same drive.
More uinit tests are required here, but so far its passing most of the time
Also, I don't intend to implement things like 'scheduling' or 'unbufferedIO' options.
Scheduling should be handled by wrapping the class into a process is my opinion on that, and since it would be a background process at that point doesn't serve much more utility than standard robocopy.
Unbuffered IO would require writing a new copy implementation, I'm currently using CopyFileEx to do the copy work and progress reporting. I don't know how that handles copying (buffered/unbuffered), but I don't intend to write one. It would be welcome as a PR if someone desired it.
Ok, feel free to pull and build the CachedRoboCopy solution I linked.
It has now passed several test cases, produces logs (nearly) identical to RoboCopy. My tests are looking very promising currently.
On average it's consistently quicker than robocopy, with some passes being MUCH quicker. (2x 180ms for robocopy back to back, while mine was 12ms + 6ms for back to back runs, the second run being quicker thanks for the caching of the evaluations)
Obviously it's not ready for production environment, but it should be soon! I'll set up the backup app to use this instead of RoboCommand in my repo to do comparisons.
Will be back home tomorrow, so will give it a try then - certainly sounds really good!
OK, tomorrow I will try to figure out why appveyor is failing here, likely due to the file locking test that we had to work around/comment out last time this occurred.
But the other repo now has its own backup app (that 99% a copy-paste of this one) but with an added group box to choose between RoboCommand and CachedRoboCommand for quick testing and comparing.
Deleted previous message - problem was my end due to some old cached files I think with GitHub Desktop - deleted everything from previous downloads of RoboSharp and 161 and now got rid of previous errors. Still can't build however, getting errors below:-
@PCAssistSoftware Sorry! I was running bunch of unit tests today, cleaning up the code, etc.
Some stuff still isn't implemented (removing all file attributes from a copied file for example), but that can come later. Should be good now!
No problem at all - now builds fine - will start to take a look
Okay, what am I missing (read: doing wrong!) here.
I am opening the new backup app, adding in my source and destination - making sure copy subdirectories /E is selected - checking the new CachedRoboCommand is the chosen method, clicking add to queue, and then start job queue and I then get error as below:-
System.AggregateException: 'One or more errors occurred. (One or more errors occurred. (Unable to cast object of type 'RFBCodeWorks.CachedRoboCopy.CachedRoboCommand' to type 'RoboSharp.RoboCommand'.))'
followed by
System.AggregateException: '4.1.2.0 0 0lures: 0dio.DesignTools.WpfTap.Utility.CountToValueConverter`1[[System.Windows.Visibility, PresentationCore, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31)'
Is it finding the RoboSharp solution, and are both up to date (RoboSharp folder should be looking at this PR)
RoboSharp and RFBCOdeWorks are on same level
Yes I have downloaded both your solution and this PR and they are in same folder on my machine c:\tmp\pcatest, them same folder structure as above apart from CachedRoboCopy is CachedRoboCopy-master\CachedRoboCopy-master as extracted from downloaded zip file from GitHub.
It is definitely seeing the RoboSharp solution as can see it in Solutions Explorer and it builds fine
Okay, so the issue I am seeing is to do with Multiple Jobs (RoboQueue), as if I just click Start Job it works fine, it only crashes when I click Add to Queue > Start Job Queue.
If it helps, when you click Add to Queue with Type of command set to RoboCommand then the job "looks" right in the job list
But when you click Add to Queue with Type of command set to CachedRoboCommand it shows as below
Maybe I should just be testing with "Start Job" and not "Start Job Queue" - was just habit doing it that way.
So, while the unit tests looked very promising, actual copy results are much less so for my custom implementation (atleast on an SSD).
Using the backup app to interchange between the two settings, RoboCopy blows my implementation out of the water (about 10-15s to copy my workload vs 45 - 60s). But thats desktop folder to desktop folder. Again, my application is approximately 1500 files, totalling less than 150MB. My average speed reported by my tool was showing around 2-5MB/Min, but that is obviously wrong (likely due to the very small file sizes) in the calculation, since it took less than a minute to do the copy.
Moving to a larger file size showed much better numbers, showing 500+ MB/min. Again, unsure how accurate, but for large sizes it seemed fine. My goal here was to avoid running the process twice, so I think for my use case it will be fine. A 'Smart' copy can always be implemented by the consumer too, for example check if its a local drive (RoboCommandor network drive (CachedRoboCommand) to optimize their own program.
Testing using Single Job copying 602MB across WAN network link between two different sites
CachedRoboCommand
6 minutes 56 seconds
extract from log lines
Total Copied Skipped Mismatch FAILED Extras
Dirs : 348 347 1 0 0 0
Files : 3514 3514 0 0 0 0
Bytes : 631579803 631579803 0 0 0 0
Ended : 02 September 2022 22:03:33
Total Time: 0 hours, 6 minutes, 56.568 seconds
RoboCommand
1 minute exactly
extract from log lines
Started : 02 September 2022 22:11:00
Total Copied Skipped Mismatch FAILED Extras
Dirs : 348 348 1 0 0 0
Files : 3514 3513 1 0 0 0 Bytes : 631579803 631572123 7680 0 0 0 Times : 0:01:58 0:00:29 0:00:00 0:00:30
Speed : 21342664 Bytes/sec. Speed : 1221.237 MegaBytes/min. Ended : 02 September 2022 22:12:00
Well it was worth a shot to implement anyway. I'm curious what tricks it's using to copy that fast, and where my slowdown is occurring. I'm thinking mine could be improved with someone much more familiar with threading applications, or maybe even running it as its own process like robocopy does (that way it's running on its own without a ton of interaction with the Caller application).
Is moving (not copying) files faster if they reside in the same drive? That was one of my driving ideas to create it.
But the Changes in this PR were necessary to implement custom commands anyway, so I stand by them.
I created the FileCopierCommand in that other repo as well, which is designed to pull a list of provided files and copy them. For example, 15 files from 15 separate folders can get added to its list, then it will copy them all to their individually specified destinations, while reporting similar to robocopy does.
Yes it was definitely worth a shot and has been interesting to follow your development along the way. It always amazes me how quick RoboCopy is compared to any other method of file copying. Years ago when I first looked into best method to use for my original app I tried all the different methods .NET offered and spent ages trawling Google and GitHub trying other alternative solutions and they all had good and bad points, but none of them could get anywhere close to the speed that RoboCopy does when copying files, especially across networks etc. It is the same outside of .NET when just dragging/dropping files or copying using command prompt or any of the powershell methods none of them come remotely close to what RoboCopy manages in my experience.
I suspect you are right and some clever use of threading would definitely improve things, so it is copying multiple files at a time, but that does add so much more complexity to handling errors and statistics etc.
Haven't tried moving (not copying) but I suspect on the same drive it would be quicker, as even just doing that in explorer it is quicker when you move rather than copy as it doesn't really "move" them as such it just re-writes the location in the MFT / journal? (whatever the file access table is called in NTFS).
Agreed - what you have done is definitely worthwhile for lots of othe reasons.
Will take a look at the FileCopierCommand you mention, that sounds useful too.
Let me know any other testing you want me to do, have access to various domain connected environments here, so copying files from one location to another to test speeds is something I can do, and is one of my main use cases lately.
Im curious what the speeds are regarding ListOnly mode vs robocopy. As I was seeing (in unit testing) this was slightly faster (or about same) on average.
In which case it could under the covers use a RoboCommand to perform the copy, but list could be it's own thing. BUT that's really only worthwhile if the /Move option is also quicker than robocopy move (which for my use case is relatively slow, since it's moving files on a USB2.0 drive to another folder, but with logging).
Using a RoboCommand under the covers would actually be a relatively simple change based in my existing code too.
Running my same job again this time with just List Only
CachedRoboCommand = 3.313 seconds RoboCommand = 2 seconds
I created a new branch in my custom implementation, this time with fully passing unit tests and threading.
https://github.com/RFBCodeWorks/CachedRoboCopy/tree/ThreadedCopying
One thing to note is that by default, (MultithreadedCopying = 0) is identical to multithreadedCopying = 1, simply because it otherwise wouldn't copy. Once I scaled up to 4+ threads on that option, it nearly kept up with robocopy in my testing.
MS obviously built in features that I have no access to, such as the windows-offload and wait-for-sharenames functionality. So all copying would use your computer as the middle-man between servers. Under that condition, RoboCopy will blow mine out of the water. But for my particular use case, It would work well, since mine is passing it from a network to a USB anyway.
That all said, the real reason I wanted to implement it was for the 'FileCopierCommand' which will be used by my app to copy a bunch of files from effectively random location on the network to various other locations, but do it within an RoboQueue sequence of operations. So, now that the other thing is passing all of its unit tests reliably, I think this is very very nearly ready.