harwest-tool icon indicating copy to clipboard operation
harwest-tool copied to clipboard

Workflow will stop if 1 submission page only has gym submissions

Open ngthanhtrung23 opened this issue 3 years ago • 5 comments

How to reproduce:

  • Set CF handle to I_love_Hoang_Yen,
  • Run harwest codeforces -p 5

What happens: the crawler stop without crawling anything, even though I have 150+ pages of submissions.

I think the reason is because page 5 has only my non-AC or gym submissions. So self.client.get_user_submissions returns an empty array, thus stopping the crawler.

ngthanhtrung23 avatar Dec 21 '20 14:12 ngthanhtrung23

Hey @ngthanhtrung23! Thanks for bringing this up. I was partly aware of the possibility of this situation arising though thought that cases like these would be fairly uncommon. Well, turns out I was wrong.

I would agree that this is an inefficiency in Harwest though is something that can be addressed manually by starting Harwest from the next page by using the --start-page configuration. This approach sure won't scale well if it happens rather often over a submission space of 150+ pages.

Fixing it would require a bit of an effort since the entire flow of the tool would have to be modified. As for the moment, maybe we can take up the approach recommended by @Mohammad-Yasser on https://codeforces.com/blog/entry/85788?#comment-735930 as a temporary solution?

nileshsah avatar Dec 21 '20 14:12 nileshsah

Yeah I was able to make it work for me by commenting out some code in workflow.py :)

        if not len(response) or not any(response):
          break

I created this issue just to bring it to your attention as some other users may face this.

ngthanhtrung23 avatar Dec 21 '20 14:12 ngthanhtrung23

Way to go @ngthanhtrung23! You sure amaze me with how quick and easy it is for you to hack on any code. I'll indeed keep this issue open and keep an eye on it. If a lot of people complain about this then will for sure fix it at once. I have to admit I'm a bit lazy :D

nileshsah avatar Dec 21 '20 14:12 nileshsah

@nileshsah I would suggest increasing the page size to 1000 or some huge number,

I was partly aware of the possibility of this situation arising though thought that cases like these would be fairly uncommon.

i think with such a huge number it would be very unlikely to occur? unless someone did 1000+ gym submission, also that would reduce the number of api calls as well

s-i-d-d-i-s avatar Dec 22 '20 04:12 s-i-d-d-i-s

Great thinking there @s-i-d-d-i-s! It does seem like a possible idea that we can use. I remember the reason why I first went with the pagination approach of 50 was to keep it in parity with the submissions page on codeforces for easy tracking, though it might not be completely necessary. Let's take up your approach as a first iteration for dealing with this problem if more people request this feature. Hopefully should not hurt the user experience much.

nileshsah avatar Dec 22 '20 09:12 nileshsah