msrsync icon indicating copy to clipboard operation
msrsync copied to clipboard

Python3

Open carlilek opened this issue 7 years ago • 29 comments

msrsync did not work with Python 3.5.2 without modifications, so I copied it to msrsync3 and made some light changes to the code. I have not fully tested it, but with a relatively simple msrsync3 -p32 --progress --stat -r '--inplace -W' /path/to/src/ /path/to/dest/ it seems to work.

--Ken

carlilek avatar Dec 31 '17 18:12 carlilek

Thank you for this pull request. Give me a few days and I'll get back to you.

jbd avatar Dec 31 '17 18:12 jbd

No rush. Just wanted to get my hands on that sweet sweet scandir optimization, and thought I'd share. 

--Ken

On Dec 31, 2017, at 1:30 PM, jbd [email protected] wrote:

Thank you for this pull request. Give me a few days and I'll get back to you. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: Thank you for this pull request. Give me a few days and I'll get back to you."}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354618164"}}}

carlilek avatar Dec 31 '17 18:12 carlilek

Just wanted to get my hands on that sweet sweet scandir optimization, and thought I'd share.

Note that you can have the scandir module installed for python2. msrync will use it if available.

jbd avatar Dec 31 '17 19:12 jbd

Ran into unicode problems. Around line 659, where I changed from wb to w. Clearly something I'm missing here.

carlilek avatar Jan 01 '18 22:01 carlilek

Are you able to reproduce the problem ?

jbd avatar Jan 01 '18 23:01 jbd

I am, but only on a particular file structure that happens to be about 300TB... and only after a random amount of time. I'm messing with encoding the string on line 659 and using wb as you had already been doing (and probably for a very good reason!). 

This is the section of the error output that seems to be important: 

[2318986/10459996 entries] [113.7 T/344.1 T transferred] [159 entries/s] [8.0 G/s bw] [monq 0] [jq 190457]Uncaught exception: Traceback (most recent call last):   File "/usr/local/bin/msrsync3", line 1120, in msrsync     write_bucket((fileno, filename), bucket, options.compress)   File "/usr/local/bin/msrsync3", line 659, in write_bucket     bfile.write(entry + '\0') UnicodeEncodeError: 'utf-8' codec can't encode character '\udcb5' in position 101: surrogates not allowed

My latest attempt (running now) is: 

 659                     bfile.write((entry + '\0').encode('utf-8','surrogateescape'))

--Ken

On Jan 1, 2018, at 6:35 PM, jbd [email protected] wrote:

Are you able to reproduce the problem ? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: Are you able to reproduce the problem ?"}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354684895"}}}

carlilek avatar Jan 02 '18 01:01 carlilek

My latest attempt (running now) is: 659 bfile.write((entry + '\0').encode('utf-8','surrogateescape'))

I'm not sure that's a good idea. I'm not sure how rsync --files-from will handle that. Be careful. The best would be to identified the pathname that cause problem.

Could you give the initial exception when you were using "wb" flags ?

jbd avatar Jan 02 '18 07:01 jbd

This would seem to be an incredibly edge case, which probably got skipped over in the python2 version because of the different exception handling in python3. I'm going to work around it in that try/except loop. I did finally find the files, and they don't even appear in Windows explorer... I'm quite comfortable with msrsync skipping them as long as it logs it somewhere retrievable. 

On Jan 2, 2018, at 2:07 AM, jbd [email protected] wrote:

My latest attempt (running now) is: 659 bfile.write((entry + '\0').encode('utf-8','surrogateescape'))

I'm not sure that's a good idea. I'm not sure how rsync --files-from will handle that. Be careful. The best would be to identified the pathname that cause problem. Could you give the initial exception when you were using "wb" flags ? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: \u003e My latest attempt (running now) is: 659 bfile.write((entry + '\0').encode('utf-8','surrogateescape'))\r\n\r\nI'm not sure that's a good idea. I'm not sure how rsync --files-from will handle that. Be careful. \r\nThe best would be to identified the pathname that cause problem.\r\n\r\nCould you give the initial exception when you were using "wb" flags ? "}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354712030"}}}

carlilek avatar Jan 02 '18 18:01 carlilek

Thank you for the feedback.

I'm hugely interested into this edge case, I'd like to perfectly understand what is going on. If you can provide a path that reproduce the problem that would be ideal.

jbd avatar Jan 02 '18 18:01 jbd

This is how the filename appears in Linux (ls):  mEOS_0s8_Int_45?W_0s5-4s_Act_3mW_Exc_20x500.zip

Here is the error that I got msrsync to throw: 

Cannot write entry Compress_and_move_to_nearline_2016_03_18/data0/palmdata/PALM Data/Image Files/Proline/mEOS_0s8_Int_45\udcb5W_0s5-4s_Act_3mW_Exc_20x500.dat to bucket file /tmp/msrsync-yk3kowib/283./044921875/tmpmampqcd1

\udcb5 is not a valid unicode character as far as I can tell. 

On Jan 2, 2018, at 1:08 PM, jbd [email protected] wrote:

Thank you for the feedback. I'm hugely interested into this edge case, I'd like to perfectly understand what is going on. If you can provide a path that reproduce the problem that would be ideal. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: Thank you for the feedback. \r\n\r\nI'm hugely interested into this edge case, I'd like to perfectly understand what is going on. If you can provide a path that reproduce the problem that would be ideal."}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354834712"}}}

carlilek avatar Jan 02 '18 18:01 carlilek

Can you run this command inside the directory containing your file and give me the output ?

find . -maxdepth 1 -type f -name "mEOS_0s8_Int_45*zip" -exec sh -c 'printf "%-10s %s\n" "$1" "$(printf "$1" | xxd -pu )"' None {} \;

jbd avatar Jan 02 '18 18:01 jbd

find . -maxdepth 1 -type f -name "mEOS_0s8_Int_45*zip" -exec sh -c 'printf "%-10s %s\n" "$1" "$(printf "$1" | xxd -pu )"' None {} ;

./mEOS_0s8_Int_45�W_0s5-4s_Act_3mW_Exc_20x500.zip 2e2f6d454f535f3073385f496e745f3435b5575f3073352d34735f416374 5f336d575f4578635f3230783530302e7a6970

or, screenshotted from the linux terminal, since the paste inserted a different character(s):

On Jan 2, 2018, at 1:22 PM, jbd [email protected] wrote:

Can you run this command inside the directory containing your file and give me the output ? find . -maxdepth 1 -type f -name "mEOS_0s8_Int_45*zip" -exec sh -c 'printf "%-10s %s\n" "$1" "$(printf "$1" | xxd -pu )"' None {} ; — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: Can you run this command inside the directory containing your file and give me the output ?\r\n\r\nfind . -maxdepth 1 -type f -name \"mEOS_0s8_Int_45*zip\" -exec sh -c 'printf \"%-10s %s\\n\" \"$1\" \"$(printf \"$1\" | xxd -pu )\"' None {} \\;"}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354838472"}}}</s cript>

carlilek avatar Jan 02 '18 18:01 carlilek

Perfect, thank you. I'll try to get back to you in the next few days.

jbd avatar Jan 02 '18 18:01 jbd

Cool. I'll continue trying to work around it, but boy oh boy is this thing pernicious. 

On Jan 2, 2018, at 1:25 PM, jbd [email protected] wrote:

Perfect, thank you. I'll try to get back to you in the next few days. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: Perfect, thank you. I'll try to get back to you in the next few days."}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354839276"}}}

carlilek avatar Jan 02 '18 18:01 carlilek

The reason it doesn't appear in the python 2 version is that python 2's default encoding is ASCII, while python 3 is utf-8. Discovered this in re-writing the except: code just now... So I think even with the Python2 version, these 4 files would not be transferred. 

This is how it's running now: 

#msrsync3 -p32 --progress --stat -r '-W --inplace' /nearline/Compress_and_move_to_nearline_2016_03_18 /hq-vault/nearline-dr/ Unable to write 'Compress_and_move_to_nearline_2016_03_18/data0/palmdata/PALM Data/Image Files/Proline/mEOS_0s8_Int_45\udcb5W_0s5-4s_Act_3mW_Exc_20x500.dat' because <class 'UnicodeEncodeError'> Unable to write 'Compress_and_move_to_nearline_2016_03_18/data0/palmdata/PALM Data/Image Files/Proline/mEOS_0s8_Int_45\udcb5W_0s5-4s_Act_3mW_Exc_20x500.mat' because <class 'UnicodeEncodeError'> Unable to write 'Compress_and_move_to_nearline_2016_03_18/data0/palmdata/PALM Data/Image Files/Proline/mEOS_0s8_Int_45\udcb5W_0s5-4s_Act_3mW_Exc_20x500.zip' because <class 'UnicodeEncodeError'> Unable to write 'Compress_and_move_to_nearline_2016_03_18/data0/palmdata/PALM Data/Image Files/Proline/mEOS_0s8_Int_45\udcb5W_0s5-4s_Act_3mW_Exc_20x500.sif' because <class 'UnicodeEncodeError'> [2/67765 entries] [1.8 G/3.4 T transferred] [0 entries/s] [17.0 M/s bw] [monq 0] [jq 1823]  

On Jan 2, 2018, at 1:25 PM, jbd [email protected] wrote:

Perfect, thank you. I'll try to get back to you in the next few days. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jbd/msrsync","title":"jbd/msrsync","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jbd/msrsync"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jbd in #4: Perfect, thank you. I'll try to get back to you in the next few days."}],"action":{"name":"View Pull Request","url":"https://github.com/jbd/msrsync/pull/4#issuecomment-354839276"}}}

carlilek avatar Jan 02 '18 18:01 carlilek

That's a bug. I'll correct that as soon as possible. Thank you for testing !

jbd avatar Jan 02 '18 18:01 jbd

FWIW I just had to copy about 1TB of data from a Debian 9 server to an external USB2 HDD as quickly as possible, and this PR's branch in its current form did a great job at maxing out the USB2 bandwidth, with no bugs or drawbacks.

Speed: 28.6 M/s
Rsync workers: 4
Total rsync's processes (161) cumulative runtime: 113237.4s
Crawl time: 0.2s (0.0% of total runtime)
Total time: 28636.7s

So that literally got me to done in about 1/4 the time it otherwise would have taken. Thank you both, @jbd and @carlilek!

rayrrr avatar Aug 26 '18 13:08 rayrrr

Awesome! I would recommend running a regular rsync immediately afterward just to be certain. I have some issues with encoding of Windows files. Been trying to hammer through what's going on with that, but it's tough going. That said, my combo of msrsync3 and a follow up rsync has been serving me quite well.

--Ken

On Aug 26, 2018, at 9:11 AM, Ray <[email protected]mailto:[email protected]> wrote:

FWIW I just had to copy about 1TB of data from a Debian 9 server to an external USB2 HDD as quickly as possible, and this PR's branch in its current form did a great job at maxing out the USB2 bandwidth, with no bugs or drawbacks.

Speed: 28.6 M/s Rsync workers: 4 Total rsync's processes (161) cumulative runtime: 113237.4s Crawl time: 0.2s (0.0% of total runtime) Total time: 28636.7s

So that literally got me to done in about 1/4 the time it otherwise would have taken. Thank you both, @jbdhttps://github.com/jbd and @carlilekhttps://github.com/carlilek!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jbd/msrsync/pull/4#issuecomment-416038003, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALYCnT0zLbeN7LYBZrq8mb0Z4NSgP0jgks5uUp5-gaJpZM4RP6MS.

carlilek avatar Aug 26 '18 13:08 carlilek

Hello,

thank you for the kind words. It reminds me about @carlilek's work and this windows encoding problem I should clearly address. I really hope having some time to figure it out.

jbd avatar Aug 26 '18 16:08 jbd

I’m closer, but darned if I can get rsync from-file to read the output files when they’re written in bytes with the proper encoding. It just says the files don’t exist if it interprets the Windows-1252. But at least the version I’ve been beating my head against doesn’t crash when it runs into them. And doesn’t silently ignore it like the current version I put on github does.

--Ken

Sent from my

On Aug 26, 2018, at 12:12 PM, jbd <[email protected]mailto:[email protected]> wrote:

Hello,

thank you for the kind words. It reminds me about @carlilekhttps://github.com/carlilek's work and this windows encoding problem I should clearly address. I really hope having some time to figure it out.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jbd/msrsync/pull/4#issuecomment-416049909, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALYCncY8vCGF3JCALLIVYnb2TpWZ7HxXks5uUsjIgaJpZM4RP6MS.

carlilek avatar Aug 26 '18 16:08 carlilek

I need to complete some more testing, but I think this is the working version, where it will handle utf-8 and windows-1252 encoding, albeit possibly with a slowdown due to inefficient bucketing

--Ken

carlilek avatar Aug 29 '18 20:08 carlilek

Hi,

This should become a blocker soon as Python 2 is dying fast. I would propose to use path.py and let the pro do the pro's work, path.py handles encoding pretty well and already solved a few corner cases, so either dig into their code or just import it. Version 11 still supports Python 2 and there's no new features/bugfixes in the latest version.

https://pypi.org/project/path.py

They do however also mention that with Python 3.6 everything should be solved, but that might be too big of a jump for now? https://www.python.org/dev/peps/pep-0519/

kklem0 avatar Aug 20 '19 22:08 kklem0

A quick look into path.py I think they're using sys.getdefaultencoding(), although the older version used sys.getfilesystemencoding()

kklem0 avatar Aug 20 '19 22:08 kklem0

pathlib was also introduced in Python 3.4, maybe it can handle these issues better?

Gunni avatar Oct 17 '19 17:10 Gunni

Hello! Any progress here?

serge2016 avatar Apr 29 '20 18:04 serge2016

No progress, sorry for that.

jbd avatar Apr 29 '20 19:04 jbd

I'd love to be able to use this tool on my system which is only python 3.

Any 2021 new year resolutions affecting this by any chance?

HaleTom avatar Jan 01 '21 15:01 HaleTom

Hi Tom,

I can't speak for jbd, but I can say that I have been using my patch successfully for quite some time. I admit that I usually run a single threaded plain rsync afterwards to make sure that everything is kosher, but it does work.

--Ken

On Jan 1, 2021, at 10:45 AM, Tom Hale <[email protected]mailto:[email protected]> wrote:

Any 2021 new year resolutions affecting this by any chance?

carlilek avatar Jan 01 '21 16:01 carlilek

Any 2021 new year resolutions affecting this by any chance?

Hello,

I understand the frustration around python3, but there has been no progress. I'd like to do a proper "review" soon of carlilek patches before merging. It looks like you could already use carlilek's work.

jbd avatar Jan 12 '21 11:01 jbd