ftp-deploy
ftp-deploy copied to clipboard
How to update only newer file?
I have upload 1 gb every deploy. But update 2 files. How upload onli this newer files?
Sorry, but that's not supported at present. I'd find it useful too, but have no timeline to develop it.
A number of people have had a go at this now
- @TomFreudenberg (#117) based on a storing a file of MD5s each time ftp-deploy is invoked
- @limikael (#151) uses a more reliable date comparison
- @paganaye (#141) who proposes storing a file on the destination (essentially of name and date)
- me (#100) based on file size and date comparison
I'm not excited about creating extra files (state) and storing them on people's computers, but adding the MD5 is perhaps the gold standard in comparing. However, MD5ing a 1 GB file (such as motivated some of the contributors of PRs) is expensive in other ways
I think my preference is for solution 2, and does work well in the PHP use case. It may not be perfect but perhaps good enough, not least as so many developers now enjoy git-style deploys anyway
Would welcome thoughts
@higimo if you have time to test this branch https://github.com/simonh1000/ftp-deploy/tree/upload-newer, that would be much appreciated
@higimo if you have time to test this branch https://github.com/simonh1000/ftp-deploy/tree/upload-newer, that would be much appreciated
I tested, and in my case, I generate a build folder every time before deploying. Therefore, my files will always have a newer date.
Maybe it's a good idea to pass as parameters the options to choose between only update if file is newer, only if size is different, or only if both (and in my case, I would choose only if size is different).
In time, nice job on this project! It saved me a lot of time :)
I have a proposal on this, what if we create hashtable of files and when you upload you upload like a .state.json and every time we want to upload, we can upload compare hash files, and if any difference, then we upload. that makes sure we upload when there is actually a difference and that could work on any environment,
What do you think?
I'm reluctant to do that @mhadaily because it create an extra piece of state that could get out of sync. I know that I have sometimes switched to an ftp client to upload a bugfix to a specific file (perhaps when I was away from my main machine) so that the state.json you propose could end up being inaccurate. The solution in the PR works for e.g. php sites - I think (that's why it needs testing) - but not, as @gabrielnvg notes, for projects that compile the distributed assets.
There do seem to be ways that work, but I have not tried to look at the source code of e.g. the grunt ftp client I used back in the day which seemed to work quite nicely
Hi maybe you checkout this approach - is working well on our side https://github.com/simonh1000/ftp-deploy/pull/117
@gabrielnvg is there any solution to your case - your build process probably also creates cache busting assets that differ by name even if the content has not changed?
@gabrielnvg is there any solution to your case - your build process probably also creates cache busting assets that differ by name even if the content has not changed?
If I remember, on that project (3 years ago haha) the build files didn't have name ganeration. So, by being the same name, they were all replaced.
Seems like we will never be able to agree on the one solution to rule them all. Perhaps add several of the methods that now exists as PRs and make it possible to select comparison method with a flag?
FTP file upload isn't exactly cutting edge and the sexiest kind of project to work on, so I can understand that this is a bit stuck. It is still quite useful from time to time for many people, however. I'm unemployed and would have time to volunteer to take it on, but would like to ask for a donation in that case, I'm currently sadly in hustler lifestyle mode: https://www.buymeacoffee.com/limikael
What I would implement then would be a flag to select either #117 or #151. Should there be more methods?
(why am I suitable for the job? one of the implementations is mine)
Hi all,
I will just respond that the PR #117 is still working on our side without any issues for years. We use also sometimes git deployments. While git also based on some type of separate info database (hashes) and checks the differences against remote repositories, I still believe in a file or dictionary with some kind of data about the files. Date/Time and Size is not suitable to compare differences in many cases. There must be some kind of hash or key to make sure what data / files we are talking about. In case that FTP won't let us call something on server I think that is the only suitable way.
Just my 2 cents
Tom
I agree, hash is more reliable... But it requires that the file with hashes is already there on the server... If you are in a situation where you have set it up that way from the beginning, then it will be there, and everything will be fine... One might even argue that this is the majority of the cases. However, in my case, there was no file there because no one had used this command before. I was tasked with updating an already existing project with a lot of files. Also, I was the only one in the team using this command, other people used regular FTP clients, and I didn't have the authority to tell them to do otherwise. In this case, it was better for me to use file date. So this is why I suggest to have a flag where one can switch, and use #117 where it makes sense, and #151 where it makes sense. @TomFreudenberg do you think having such a switch will cause problems or confusion?
this project pre-dates me and I suspect it's origins are PHP and WordPress. that's how I've generally used it. there, I works argue that date is enough.
I'm not necessarily the optimal engineering solution. Mikael makes some excellent, practical comments too
On Sun, 21 May 2023, 15:18 Mikael Lindqvist, @.***> wrote:
I agree, hash is more reliable... But it requires that the file with hashes is already there on the server... If you are in a situation where you have set it up that way from the beginning, then it will be there, and everything will be fine... One might even argue that this is the majority of the cases. However, in my case, there was no file there because no one had used this command before. I was tasked with updating an already existing project with a lot of files. Also, I was the only one in the team using this command, other people used regular FTP clients, and I didn't have the authority to tell them to do otherwise. In this case, it was better for me to use file date. So this is why I suggest to have a flag where one can switch, and use #117 https://github.com/simonh1000/ftp-deploy/pull/117 where it makes sense, and #151 https://github.com/simonh1000/ftp-deploy/pull/151 where it makes sense. @TomFreudenberg https://github.com/TomFreudenberg do you think having such a switch will cause problems or confusion?
— Reply to this email directly, view it on GitHub https://github.com/simonh1000/ftp-deploy/issues/42#issuecomment-1556178443, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIDMFHU6LX62WP7BV4LMDTXHII2VANCNFSM4EQQYKVQ . You are receiving this because you were assigned.Message ID: @.***>
Hey guys, I do not see any problem in having both options.
Just to make it clear:
But it requires that the file with hashes is already there on the server.
No, it has not to be there. If the file is not there, then it is the same as all are missing or be different. What might also be done is, that all files on remote gets deleted in that case.
- Check if remote hash file exists
- If yes - proceed
- If no - copy all files to remote
- Last there might be a cleanup process - looking for all files on remote not in hash file yet an delete them. That would create a consistent duplication on remote side.
Step 4. might be optional or partial based on remote path-regex.
P.S.: 1. to 4. is our current workflow
Great!
No, it has not to be there. If the file is not there, then it is the same as all are missing or be different.
So let me also clarify that I understand this. I also approve of your process and workflow. However, in the project where I used this software I didn't have the authority to tell people to use such a workflow, even if I would have wanted. Think about the scenario where I would have used it, and I would have updated the hash file, but other people in the project would have used other FTP clients. There is a potential scenario then where someone else updates a file, but they don't update the hash file. The project also included a lot of huge files, e.g. movies and such. My job was to update just a few .html files. In this particular case, file date seemed like the best option. So yeah! Being able to select algorithm seems like a good solution!
I also see that #117 relies on a config key fileFolderHashSums. Whereas #151 relies on newFilesOnly. So it seems that even with the current implementations there shouldn't be a conflict. @simonh1000 anything preventing you from just go ahead and merge these both solutions?
Think about the scenario where I would have used it, and I would have updated the hash file, but other people in the project would have used other FTP clients. There is a potential scenario then where someone else updates a file, but they don't update the hash file.
OK, got it