fansly-downloader icon indicating copy to clipboard operation
fansly-downloader copied to clipboard

Rewrite/Refactoring

Open prof79 opened this issue 2 years ago • 8 comments

Hi there,

this is the rewrite/refactoring I promised. Took a week in my vacation.

It's now modularized and avoids code duplication where possible.

A configuration object, representing config.ini, and a download state (per creator) is threaded through the whole application. The config object also handles the synchronization to config.ini on save. Thus multi-user scraping also becomes a reality.

There is also no need to pre-supply a config.ini anymore - if it is missing a blank one will be generated automatically; missing options are added with their defaults automatically (on save).

You can fully operate it via command-line now (see fansly_downloader.py -h).

It also cleanly parses weird input like:

fansly_downloader.py -u here ,goes , nothing, @of,Course

which should have been, according to OS parsing semantics:

fansly_downloader.py -u here goes nothing @of Course

but will still be parsed as:

['here', 'goes', 'nothing', 'of', 'course']

User lists will be split by comma or space otherwise.

The program can now also be operated non-interactively like in a scheduled task, preventing input() and sleep()ing instead. There is also a log file now helping such headless scenarios.

I also introduced an option to omit the _fansly suffix because it felt redundant.

I took a few creators and compared the directory output with your 0.4.1 and that checked out. I tried several things but certainly could not test each and any variation and situation, especially different platform checking fell short. I tried to refine the update logic but saw no way of testing it at all.

There are certainly things I forgot to mention but see RewriteNotes.md for some stuff I noted down the path.

-- Markus

prof79 avatar Aug 30 '23 21:08 prof79

Thank you for submitting this pull request!

I've had a brief look at it, but unfortunately, during the same time, I was also making changes to the master branch of Fansly Downloader for version 0.4.2. These changes include:

  • Implementing metadata handling for the most common file formats.
  • Adding two new requirements, pyexiv2 and mutagen.
  • Removing the custom exit() function since the current versions of pyinstaller seem to work without it.
  • Introducing del_redudant_pyinstaller_files(), designed to delete old MEI folders created by pyinstaller every time the executable version of Fansly Downloader is launched but left behind without clearing them.
  • Adjusting for the recent rate-limiting update by Fansly.

It would be great if you could incorporate these recent changes into your fork & pull request as well.

Furthermore, I'll be going on a month-long vacation, so I won't be able to review or interact with this pull request for a while. However, upon my return, I will thoroughly test these changes and may create a dev branch to fine-tune them before merging them into the master branch.

From my initial observation, I'm striving to maintain a clean initial file structure. When someone opens the repository, it can be overwhelming to see numerous folders and files. I suggest organizing them into subfolders, similar to what I was doing previously with the 'utils' folder, and possibly creating additional subfolders below it for better categorization. Regarding the .gitignore, while it's valuable for development purposes, I feel it might be somewhat unnecessary overall.

Edit: Actually I am not sure if I can even stay with these metadata changes as pyexiv2 requires exiv2.dll so I can't even package it with pyinstaller without it erroring out ...

Avnsx avatar Sep 02 '23 04:09 Avnsx

The changes are there with one exception: I'm a fan of clean code and encapsulation and MetadataManager() isn't. You know that there are file limits in pyexiv2 of 1 GiB/2 GiB and do nothing about it - a typical case for MetadataManager() to raise an exception on too large files, return False or whatever.

Also I do not incorporate changes when I don't know where the journey is heading and what the intended use is - does this mean that file names get shorter then/no id/hash info? Will they still be unique? What about existing repositories of users on their drives/NAS? Will existing files stay the same? Will their names be converted? Will the users be able to opt-in to file name conversion? There is enough multimedia software like image libraries which remember last used files, catalog files and so on.

Therefore I added the MetadataManager class but did not do anything with it in the file hashing code as the path is not clear and also stuff would need to be rewritten for cleanliness and safety.

Btw your rate-limiting changes worked for a couple of hours, or a day, but the second timeline-request of a creator is empty when it shouldn't. You can proof this using Postman or any other tool that a request yielding posts when repeated again after a few secs will be empty.

prof79 avatar Sep 03 '23 13:09 prof79

You know that there are file limits in pyexiv2 of 1 GiB/2 GiB and do nothing about it - a typical case for MetadataManager() to raise an exception on too large files, return False or whatever.

This is because I was planning to handle that within fansly-downloaders main code itself until after some testing noticed that it's very unrealistic for images to be 1 GB or above in filesize, which in return downgraded the priority of me adding the handling logic for this case as it's very low, to the point I decided that I'll add this later on, when I return from my vacation. 🌴

does this mean that file names get shorter then/no id/hash info?

That is indeed the case, filenames are relatively short now.

Will they still be unique?

Yes, to make sure they're still unique I just randomised the epoch timestamps that fansly delivers for e.g. bundled posts by +/- 30 minutes. Later on I plan to change this maybe, to instead of using the scheduled post dates fansly provides for each content, to use the base post date, this would ensure people could cross-find downloaded content on the fansly website too.

What about existing repositories of users on their drives/NAS?

The late introduction of Exif metadata support has no effect onto previously existing download folders, they just stay supported. The support for this is also handled within version 0.4.2 of fansly-downloader, it just reads hash and media id out of filenames if they were provided there, else it just gets read from Exif metadata. Code for this is in extract_media_id() and extract_file_hash() of version 0.4.2

Will existing files stay the same?

Yes, that's already the case and implemented in fansly downloaders main source code of version 0.4.2.

Will their names be converted?

No, just stay the same, didn't want to bother adding extra logic for that. Only future downloads receive Exif metadata.

Will the users be able to opt-in to file name conversion?

No, generally file name conversion will never be a thing that I'll personally add, as I don't see the urgent necessity + even if opting in for metadata_handling "Advanced", it's not like every file format is supported, so atthe end of day you'll still see some files like the gif format ones, have media id and hash in their filename.

stuff would need to be rewritten for cleanliness and safety.

This is not true, everything you mentioned is already handled decently well, except the 1 gb / 2 gb stuff, which I'll add at some point. The only thing I'm unsure of, is if I've really utilised the mosr benefitial cross-platform compatible pre-existing Exif metadata "placeholders", as some of these are either not visible to the end user at all or only visible using special tools e.g. websites offering metadata reading functionality. And of course the biggest issue, which I didn't notice until the very last second, is that pyexiv requires exiv2.dll, which might be too big in filesize and cause fansly-downloadee to either not be packageable at all or take literally minutes to open when packaged with pyinstallers one-file mode. Though a bypass to this issue would be Nuitka, which is very promising and completly convienced me, except it's packaging / compiling / performance optimising techniques cause alot of false positives with every anti-virus software in existence 😅😓

Btw your rate-limiting changes worked for a couple of hours, or a day, but the second timeline-request of a creator is empty when it shouldn't.

I'm aware fansly has relatively fast noticed me falling back to their old api endpoint for timeline, to bypass their rate-limiting, as at that point in time the old api endpoint didn't have the rate-limiting functionality applied to it yet. Shortly after the release of version 0.4.1-post1 this was no longer the case. Feels like they are now actively counter patching me, which is kinda funny because it leads nowhere. I hope their core intention is to just reduce / balance load on their servers and not try to defeat scrapers, as that's in my opinion really pointless. Even if I ceased service of fansly-downloader there would just be others remaking their versions of it and if they just keep pushing annoying security updates, there's countless ways of countering that programmatically, it's just such a waste of time overall.

Also could you try out this branch and see if it solves the rate-limiting issue again? Within that branch, fansly-downloader switches back to /timelinenew and is just artifically slowed down to avoid hitting the rate-limit. If that doesn't work just higher this sleep timer even more or add another sleep inbetween each iteration of the sort_download for loop, so there's sleep between downloading the actual content itself too or introduce propper logic e.g. if timeline cursor empty > dynamically self-increase wait timer and re-try that couple times before "giving up" (pay attention that the last timeline cursor will always be empty, even within a successfull scraping session). Maybe it might have to do something with the new variables / tokens they added to each timeline request now.

I'm on a vacation for a few weeks, so I don't have access to a python environment (or a PC), so it would mean alot if you helped out.

Avnsx avatar Sep 05 '23 03:09 Avnsx

@prof79 Nevermind about that 0.4.2 version, I am honestly too lazy to find out how to fix the metadata stuff so that it would work with macOS and also be packageable with pyinstaller.

I would appreciate it alot, if you could revert your changes to this version of the re-write back to how it was with version 0.4.1 (before I added the metadata adding stuff), else just let me know and I'll do it myself and finally create a dev branch, I think I kinda found ambition to work on this open source project again.

Avnsx avatar Nov 03 '23 03:11 Avnsx

Well I had only added the metadata class - but tbh I never added the metadata code itself; since I have not had the time to learn properly about EXIF and 3rd party tagging and potential implications - thus in doubt of the proper way forward I refrained from doing it.

prof79 avatar Nov 10 '23 21:11 prof79

Hey @prof79 just so u know, i tried your fork for quite some time now and i noticedd this error keeps coming up:

Info | 12:27 || Downloading video '2023-11-23_at_04-10_id_583908461062922240.m3u8'
 [43]ERROR | 12:27 || Unexpected error during Timeline download: 
Traceback (most recent call last):
  File "/fansly-downloader/fansly-downloader-rewrite/fansly_downloader/download/common.py", line 89, in process_download_accessible_media
    download_media(config, state, accessible_media)
  File "/fansly-downloader/fansly-downloader-rewrite/fansly_downloader/download/media.py", line 110, in download_media
    file_downloaded = download_m3u8(config, m3u8_url=media_item.download_url, save_path=file_save_path)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fansly-downloader/fansly-downloader-rewrite/fansly_downloader/download/m3u8.py", line 101, in download_m3u8
    audio_stream = input_container.streams.audio[0]
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range


Press <ENTER> to attempt to continue ...

This appears constantly and i have to press enter every time to continue. And the fork feels slow. It could be due to this bug, i am not sure, but compared to the main repo it feels slower.

Joly0 avatar Dec 09 '23 11:12 Joly0

Hey @Joly0 , thanks - you should probably switch to my Fansly Downloader NG which I've been using for months myself now:

https://github.com/prof79/fansly-downloader-ng

This is a bug of sloppy coding in the original code assuming that m3u8s always have video and audio both which is not the case. This seems to be a media piece without audio. This issue has already cropped up and been fixed over there: https://github.com/prof79/fansly-downloader-ng/issues/2

Regarding speed, NG has intrinsic random delays below one sec per media and 2 to 4 secs per timeline page to prevent anti-rate-limiting measures from the Fansly servers. There was a time when Fansly servers forced to wait around a full minute between fetching timelines or you would just get empty results. This last part, however, can be configured in NG using --timeline-delay-seconds (and --timeline-retries) on the command-line or as timeline_delay_seconds/timeline_retries under Options in the .ini. You may also like to run it with -ni to have no interactive prompts at all except the "finished" one.

Apart from that it's difficult to judge and compare speeds starting with the definition of what is slow and what is not.

prof79 avatar Dec 09 '23 12:12 prof79

Thanks @prof79 i'll give it a try

Joly0 avatar Dec 09 '23 12:12 Joly0