slackdump icon indicating copy to clipboard operation
slackdump copied to clipboard

Ignoring errors

Open pawelgnatowski opened this issue 2 years ago • 31 comments

Describe the bug while bugs are encountered, export process terminates To Reproduce Steps to reproduce the behavior:

  1. Run slackdump like this '...' slackdump -f -export MyDump @channelList.txt Expected behavior Continue or prompt if to ignore error. Output 2022/08/16 13:53:39 application error: export error: failed to dump "xyz" (xxx): callback error: failed to dump channel xxx: strconv.Atoi: parsing "null": invalid syntax

Desktop (please complete the following information):

  • OS: [e.g. macOS] Windows Additional context Same problem occurs if there is temporary slack down time e.g. 503 error. For larger exports it may take time to process. It would be good to expose flags to ignore error and/or retry X times after Y time then ignore or exit. Now i need to forgo entire channel as it is failing. Also had to write script that compares continent if other channels fail (not fatally) and can be restarted later.

Otherwise gj bro!

pawelgnatowski avatar Aug 16 '22 14:08 pawelgnatowski

Hey @pawelgnatowski , thank you for your report and glad that you found this tool useful.

I'll see what I can do with this.

rusq avatar Aug 17 '22 02:08 rusq

  1. It seems that this error is propagated from the slack-go/slack library (there's no place in the slackdump code, where that calls the Atoi) and is related to JSONTime type. That's bad news, because if it was our code, it would be possible to omit the particular field value, with slack library, we'll need to sacrifice the whole batch of messages (ConversationsPerReq is the maximum, but API may return less, depending on it's mood) without complex iterative logic which would decrease the batch size and retry the API call until it, eventually, succeeds, therefore identifying the failing message in the batch. Of course, losing, say, 100 message in a batch is not a tragedy, comparing to losing all channel data.
  2. Can you confirm that the command line in the issue is correct? The command line refers to "export" mode, while this error can only be returned if running in "conversation dump" mode. I will implement the "ignore errors" flag globally, asking out of interest.

rusq avatar Aug 17 '22 05:08 rusq

You may be right - i actually have tried full export first but due to another error and sheer size of 5k channels i wanted to limit the amount of channels, the only way i found is by providing channel list. BTW. Is there any way to get saved-items (kind of starred elements?) and mentions and reactions - this is how i build my MVP list which i do not see exported anywhere.

Command: slackdump.exe -f -export xxx @channels.txt

pawelgnatowski avatar Aug 17 '22 06:08 pawelgnatowski

Ah, it makes sense now. Re starred items - no, for now Slackdump is quite simple - only gets channels, users, and conversations.

  1. Starred items (i just checked) is a separate api call. It's actually a very good feature suggestion.
  2. There's no dedicated API to get mentions, as they're just a markup within the message,
  3. but there's a reactions.list API endpoint which, similar to starred items, can be used to get all items that the user has reacted on. Will place it in the TODO list as well :)

rusq avatar Aug 17 '22 06:08 rusq

sounds good, any ETA on the 1,3 - need to know how dirty i need to make my hands, as time window is closing fast. btw - i tried slack export viewer - i guess it is either full export or it does not work :( but that i guess could tackle at a later time... damn i love Slack.

pawelgnatowski avatar Aug 17 '22 06:08 pawelgnatowski

Sorry, no ETA on this - i do it in my free time, features are plentiful, and I got only two hands 😂 But I'll see what I can do

rusq avatar Aug 17 '22 06:08 rusq

When I released it open source I hoped that there'd be people contributing, as it seems to be helpful, but I guess the time hasn't come yet.

rusq avatar Aug 17 '22 06:08 rusq

wish i could - haven't picked up Go yet.

pawelgnatowski avatar Aug 17 '22 07:08 pawelgnatowski

That's no problem, Pawel :) Feature suggestion or bug report are also great contributions, feedback loop is very important.

rusq avatar Aug 17 '22 07:08 rusq

i definitely must go for 1 & 3 which means i'll probably use node or python for it. Can share some lessons learned for the APIs you have mentioned. Thanks for doing this project. Kudos!

pawelgnatowski avatar Aug 17 '22 07:08 pawelgnatowski

Thank you :)

rusq avatar Aug 17 '22 07:08 rusq

@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works?

rusq avatar Aug 17 '22 12:08 rusq

Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.

Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.

rusq avatar Aug 17 '22 12:08 rusq

@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works? can't download it, blocked download - virus detected ;]

pawelgnatowski avatar Aug 17 '22 13:08 pawelgnatowski

Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.

Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.

  1. ok - what if you change pagination size to mby ommit offending message and still get a token - or am I missing something?
  2. sounds reasonable.

pawelgnatowski avatar Aug 17 '22 13:08 pawelgnatowski

can't download it, blocked download - virus detected ;] https://www.virustotal.com/gui/file/089be6d45ee681e8936d8d7b98c2e471d5e3bea887b509e0e7b332d592585388?nocache=1

Seems like a false positive? Anyway, I can understand the lack of trust.

Here are the changes in slackdump: https://github.com/rusq/slackdump/compare/master...i109-qf

and here are the changes in the slack lib fork: https://github.com/rusq/slack/compare/master...null-time

Would you be able to checkout and build branch i109-qf on your machine and check if that works?

rusq avatar Aug 17 '22 20:08 rusq

Hey, not about trust, just literally it was blocked by browsers. I will be back in a couple of days and will try then.

pawelgnatowski avatar Aug 25 '22 12:08 pawelgnatowski

gave the zip file another try and it works ^_^, guess M$ updated defender or smth. tried the faulty channel again: 2022/08/27 10:51:58 error saving "FGHGC2XFG-" to "xxx\attachments": callback error: download to "xxx\attachments\FGHGC2XFG-" failed, [src=]: received empty download URL process continues though... <3

pawelgnatowski avatar Aug 27 '22 08:08 pawelgnatowski

Thanks! Looks like there's some malformed file within that channel - there's an ID of this file ("FGHGC2XFG"), but no name, no URL etc. Very strange. But glad to hear that it works, I basically modified the slack library to ignore empty JSONTime. I'll submit the PR to upstream slack library. If that doesn't get through, i'll just maintain the change in the fork.

rusq avatar Aug 27 '22 09:08 rusq

I have prepared a tool for #115, that shows the RAW output of the API - can I ask you to run it on that channel, and copy/paste the JSON for that file object with "ID": "FGHGC2XFG". Would be interesting to see what in the actual fuck is going on over there? rawoutput.zip

It uses the same auth as the slackdump, so you could run it like this:

rawoutput.exe channel_id

it will generate the slackdump_raw.log file which is a dump of headers and JSON output from the API - could you please search for FGHGC2XFG, and paste the surrounding json object in this thread? Most likely it will be empty, but if it contains some identifiable information, i.e. slack workspace name, it would make sense to obfuscate it, or replace with meaningless strings. I'd be keen to see what fields of that malformed file are populated and which are not.

rusq avatar Aug 27 '22 09:08 rusq

{ "type":"message", "text":"ZZZZ)\n\nYYY\n\nHHH?", "files":[ { "id":"FGHGC2XFG", "mode":"tombstone" } ], "upload":true, "user":"XXX7CK4GK", "display_as_bot":false, "ts":"1551261378.012600", "thread_ts":"1551261378.012600", "reply_count":2, "reply_users_count":2, "latest_reply":"1551262603.013800", "reply_users":[ "XXX7CK4GK", "XXX1654PP" ], "is_locked":false, "subscribed":false }

pawelgnatowski avatar Aug 27 '22 09:08 pawelgnatowski

Very interesting - it looks like it's a "deleted remote file" according to the this doc

Probably they are so rare, that no one ever had this special case with the slack lib. I searched through their issues and was unable to find anything on this.

Thank you!

rusq avatar Aug 27 '22 09:08 rusq

TODO:

  • [ ] Handle tombstone files
  • [x] Open an issue with slack-go/slack on "tombstone" files.

rusq avatar Aug 27 '22 09:08 rusq

@pawelgnatowski I was trying to reproduce this the other day, the same way I did with #119 (the test code is in the issue I've opened with slack lib https://github.com/slack-go/slack/issues/1104), however I did not get the unmarshal error, until I've added a "timestamp":null piece to the file.

  1. If you still have the raw_output file that was generated, could you please search it for the string "null"?
  2. If it's there, could you please post it the way you did last time with the PII removed, so I could use it to open another issue with the slack lib?

Thank you!

rusq avatar Aug 30 '22 20:08 rusq

{ "type": "message", "text": "We're starting a data science community ", "files": [ { "id": "FSXXX79LN", "created": 1573757704, "timestamp": null, "name": "Data_Science_Community_of_Practice", "title": "Data Science Community", "mimetype": "application\/vnd.slack-docs", "filetype": "docs", "pretty_type": "Arugula", "user": "UXX617GTY", "editable": true, "size": 8886, "mode": "docs", "is_external": false, "external_type": "", "is_public": true, "public_url_shared": false, "display_as_bot": false, "username": "", "url_private": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/data_science_community_of_practice", "url_private_download": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/download\/data_science_community_of_practice", "permalink": "https:\/\/myteam.slack.com\/files\/T0XXX3EC\/FSXXX79LN", "permalink_public": "https:\/\/slack-files.com\/T0XXX3EC-FSXXX79LN-0733464a3f", "preview": "<p><br><br>We're staring a data science community of practicexxxxxxxxxxx<br><br><br><\/p>", "editor": null, "last_editor": null, "non_owner_editable": null, "updated": null, "is_starred": false, "has_rich_preview": false } ], "upload": true, "user": "UXX617GTY", "display_as_bot": false, "ts": "1561469761.006800", "thread_ts": "1561469761.006800", "reply_count": 12, "reply_users_count": 12, "latest_reply": "1562083633.018200", "reply_users": [ "UDVYYY3CH", "UDQYYYHGB", "UEJYYYG5Q", "UDZYYYPJR", "UERYYYZ6H", "U20YYY4UB", "UF7YYYFG9", "UCRYYYLUB", "UDCYYYYV8", "UE4YYY3S7", "UCXYYYYR1", "UE2YYYYNA" ], "is_locked": false, "subscribed": false }

pawelgnatowski avatar Aug 31 '22 07:08 pawelgnatowski

Excellent, thank you! Reproduced straight away!

_experiments/slack/bug109$ go run .
2022/08/31 17:26:07 strconv.Atoi: parsing "null": invalid syntax
exit status 1

rusq avatar Aug 31 '22 07:08 rusq

Created an issue https://github.com/slack-go/slack/issues/1107 and PR https://github.com/slack-go/slack/pull/1106 for the upstream library.

rusq avatar Aug 31 '22 07:08 rusq

Btw. The stars and reactions API is super straight forward Added team and user ids and got what i needed. Super easy! Thanks for the tip!

pawelgnatowski avatar Aug 31 '22 09:08 pawelgnatowski

Hey @pawelgnatowski , sorry, I was too focused on the API issue, and the reactions and bookmarks completely slipped my mind. I'll create a separate issue for those, not to lose track.

rusq avatar Sep 01 '22 08:09 rusq

No prob, like you said, you do it when you do it. I used your suggestions and just went to Slack api pages and voila. Anyway, maybe you know of a good way to browse and search the dump? Would be awesome to also get full text search, also with docs, ppt etc. Any suggestions/ideas?

pawelgnatowski avatar Sep 01 '22 09:09 pawelgnatowski