PatreonDownloader icon indicating copy to clipboard operation
PatreonDownloader copied to clipboard

Change In Attachments

Open emerladCoder opened this issue 1 year ago • 2 comments

It seems for me they have changed the API for attachments so they are not downloading the attachments for posts.

Instead of including attachments it is attachments_media in the API call.

So the API call needs to be updated in PatreonPageCrawler.cs as well as changing the ParsePage updated to handle attachments as type Media instead of type Attachment (the URL and Name property are DownloadUrl and FileName for media type and the check for type should check for media instead of attachment as the type).

And updating the Posts.cs to use attachments_media for the getter/setter

I don't know if this is only for new posts... don't know if both types of attachments should be supported or if they are all switched over to media attachments.

diff --git a/PatreonDownloader.Implementation/Models/JSONObjects/Posts.cs b/PatreonDownloader.Implementation/Models/JSONObjects/Posts.cs
index 776e494..23d491a 100644
--- a/PatreonDownloader.Implementation/Models/JSONObjects/Posts.cs
+++ b/PatreonDownloader.Implementation/Models/JSONObjects/Posts.cs
@@ -192,7 +192,7 @@ namespace PatreonDownloader.Implementation.Models.JSONObjects.Posts
     {
         [JsonProperty("access_rules")]
         public AccessRules AccessRules { get; set; }
-        [JsonProperty("attachments")]
+        [JsonProperty("attachments_media")]
         public Attachments Attachments { get; set; }
         [JsonProperty("audio")]
         public Audio Audio { get; set; }
diff --git a/PatreonDownloader.Implementation/PatreonPageCrawler.cs b/PatreonDownloader.Implementation/PatreonPageCrawler.cs
index 45fdd12..e9f9d4c 100644
--- a/PatreonDownloader.Implementation/PatreonPageCrawler.cs
+++ b/PatreonDownloader.Implementation/PatreonPageCrawler.cs
@@ -31,7 +31,7 @@ namespace PatreonDownloader.Implementation
 
         //TODO: Research possibility of not hardcoding this string
         private const string CrawlStartUrl = "https://www.patreon.com/api/posts?" +
-                                             "include=user%2Cattachments%2Ccampaign%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Caccess_rules.tier.null%2Cimages.null%2Caudio.null" +
+                                             "include=user%2Cattachments_media%2Ccampaign%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Caccess_rules.tier.null%2Cimages.null%2Caudio.null" +
                                              "&fields[post]=change_visibility_at%2Ccomment_count%2Ccontent%2Ccurrent_user_can_delete%2Ccurrent_user_can_view%2Ccurrent_user_has_liked%2Cembed%2Cimage%2Cis_paid%2Clike_count%2Cmin_cents_pledged_to_view%2Cpost_file%2Cpost_metadata%2Cpublished_at%2Cpatron_count%2Cpatreon_url%2Cpost_type%2Cpledge_url%2Cthumbnail_url%2Cteaser_text%2Ctitle%2Cupgrade_url%2Curl%2Cwas_posted_by_campaign_owner" +
                                              "&fields[user]=image_url%2Cfull_name%2Curl" +
                                              "&fields[campaign]=show_audio_post_download_links%2Cavatar_photo_url%2Cearnings_visibility%2Cis_nsfw%2Cis_monthly%2Cname%2Curl" +
@@ -230,7 +230,7 @@ namespace PatreonDownloader.Implementation
                     foreach (var attachment in jsonEntry.Relationships.Attachments.Data)
                     {
                         _logger.Debug($"[{jsonEntry.Id} A-{attachment.Id}] Scanning attachment");
-                        if (attachment.Type != "attachment") //sanity check 
+                        if (attachment.Type != "media") //sanity check 
                         {
                             string msg = $"Invalid attachment type for {attachment.Id}!!!";
                             _logger.Fatal($"[{jsonEntry.Id}] {msg}");
@@ -238,7 +238,7 @@ namespace PatreonDownloader.Implementation
                             continue;
                         }
 
-                        var attachmentData = jsonRoot.Included.FirstOrDefault(x => x.Type == "attachment" && x.Id == attachment.Id);
+                        var attachmentData = jsonRoot.Included.FirstOrDefault(x => x.Type == "media" && x.Id == attachment.Id);
 
                         if (attachmentData == null)
                         {
@@ -249,8 +249,8 @@ namespace PatreonDownloader.Implementation
                         }
 
                         PatreonCrawledUrl subEntry = (PatreonCrawledUrl)entry.Clone(); ;
-                        subEntry.Url = attachmentData.Attributes.Url;
-                        subEntry.Filename = attachmentData.Attributes.Name;
+                        subEntry.Url = attachmentData.Attributes.DownloadUrl;
+                        subEntry.Filename = attachmentData.Attributes.FileName;                       
                         subEntry.UrlType = PatreonCrawledUrlType.PostAttachment;
                         subEntry.FileId = attachmentData.Id;
                         crawledUrls.Add(subEntry);

emerladCoder avatar Sep 29 '24 19:09 emerladCoder

It appears to affect posts after 2024-09-26, since they changed it to attachments_media like you mentioned. Old posts do not appear to have changed or been retroactively updated, so the old way of doing it also still needs to be included.

major-minor-patch avatar Oct 15 '24 20:10 major-minor-patch

I will try to do some research and release a new update with necessary fixes before the end of the month.

AlexCSDev avatar Oct 15 '24 20:10 AlexCSDev

It appears to affect posts after 2024-09-26, since they changed it to attachments_media like you mentioned. Old posts do not appear to have changed or been retroactively updated, so the old way of doing it also still needs to be included.

I think it's an important data field change with some migration behind the scene.

Based on my observation, /api/posts have deprecated the old attachments field in favour of attachments_media. New posts contains attachments_media only.

Querying attachments from the API appears to be returning outdated data (for example, a post which the author claims that attachments are deleted still have something in attachments field, but attachments_media is empty):

"relationships": {
    "attachments": { // Indeed there's no attachment in this post, they have been deleted by the author
        "data": [
            {
                "id": "206XXXXX",
                "type": "attachment"
            },
            {
                "id": "206XXXXX",
                "type": "attachment"
            }
        ]
    },
    "attachments_media": { // Only this field reflected the deletion correctly, there's no attachment if you check from browser
        "data": []
    }
},

For old posts that have valid attachment, seems like they have migrated the files and given it new id and type, for example:

"relationships": {
    "attachments": { // this seems to be the old, outdated field
        "data": [
            {
                "id": "205XXXXX",
                "type": "attachment"
            }
        ]
    },
    "attachments_media": { // this is the new one
        "data": [
            {
                "id": "354XXXXXX",
                "type": "media"
            }
        ]
    }
},

About the solution, @emerladCoder have provided a good one, I think it needs some improvements tho. Specifically, I want to avoid downloading attachments again due to change of the id and media type. In the example above, I already have attachment_205XXXXX_FILENAME.zip downloaded, but the change might lead to a duplicated download of media_354XXXXXX_FILENAME.zip which will mess up the gallery system due to the duplication.

xWTF avatar Oct 21 '24 12:10 xWTF

While I understand possible frustration, it's unlikely I will be implementing any solution against file re-downloading because of id changes. I am open for someone else implementing a fix for this issue with built-in solution for id change identification though. I can't stress enough that is should be very reliable.

It also seems like it is worth researching if it is possible to download deleted attachments if they are still referenced in old attachments array

AlexCSDev avatar Oct 21 '24 13:10 AlexCSDev

While I understand possible frustration, it's unlikely I will be implementing any solution against file re-downloading because of id changes. I am open for someone else implementing a fix for this issue with built-in solution for id change identification though.

While this is a bit disappointing, it's totally understandable since patreon doesn't provide us any useful metadata (to identify unique files) other than the filename.

I'm planning to migrate existing local files with some sort of script when your new version is released. Optimistically it should be simply fetching old and new fields, compare the filename and rename.

It also seems like it is worth researching if it is possible to download deleted attachments if they are still referenced in old attachments array

Yes it's possible, based on my experiment, attachments deleted in new storage are still accessible from old storage, the API is returning a valid url pointing to them just like before.

But I'd like to propose against this, or at least, this should be an opt-in feature behind a cli flag, because:

  1. We're not sure when will patreon remove the old field and related files from their database since there's no reason to keep them (existing files have been migrated to the new storage)
  2. This would cause duplicated files being downloaded (from old and new storage)
  3. This is easy to detect since official apps and webs won't request the old field, which may cause our accounts being flagged

xWTF avatar Oct 22 '24 00:10 xWTF

This sould be fixed in Release 28. Let me know if there are any issues.

I opted for completely switching to attachments_media instead of attempting to parse both old and new arrays.

AlexCSDev avatar Oct 22 '24 15:10 AlexCSDev

I'm providing my rename script here in case anyone needs one. May I request @AlexCSDev to put a link to this comment on the release page?

A few IMPORTANT things to note before using:

  • Make sure your artist provides attachments via Patreon. If they're always using Mega/Google, you're not affected and there's no need to migrate.
  • This script migrates purely depending on filenames. Better make sure your latest download is recent enough (ideally right before the API change) to avoid possible attachment re-uploads with the same filename. Of course, this is very unlikely to happen. If you know the artist won't do that, you may ignore this situation.
  • You'd better upgrade to the highest support level (ideally being able to see every historical post) to make sure the migration is as complete as possible.
  • Do it ASAP. We'll never know when Patreon will delete those old metadata, which will make it impossible to migrate anymore.

You'll have to repeat the steps below for all artists. This may be tedious depending on the number of artists. If you have too many supported artists, it's also easy to add a for loop and crawl all of them at once.

Step 1: Get attachment metadata

Open your Patreon webpage, check for the campaign ID and fill in the following script:

// Change this to the target campaign ID
const CAMPAIGN_ID = FILL_THE_CAMPAIGN_ID_HERE;

let posts = {}, url = `https://www.patreon.com/api/posts?include=attachments_media%2Cattachments&fields[campaign]=currency%2Cshow_audio_post_download_links%2Cavatar_photo_url%2Cavatar_photo_image_urls%2Cearnings_visibility%2Cis_nsfw%2Cis_monthly%2Cname%2Curl&fields[post]=change_visibility_at%2Ccomment_count%2Ccommenter_count%2Ccontent%2Ccreated_at%2Ccurrent_user_can_comment%2Ccurrent_user_can_delete%2Ccurrent_user_can_report%2Ccurrent_user_can_view%2Ccurrent_user_comment_disallowed_reason%2Ccurrent_user_has_liked%2Cembed%2Cimage%2Cinsights_last_updated_at%2Cis_paid%2Clike_count%2Cmeta_image_url%2Cmin_cents_pledged_to_view%2Cmonetization_ineligibility_reason%2Cpost_file%2Cpost_metadata%2Cpublished_at%2Cpatreon_url%2Cpost_type%2Cpledge_url%2Cpreview_asset_type%2Cthumbnail%2Cthumbnail_url%2Cteaser_text%2Ctitle%2Cupgrade_url%2Curl%2Cwas_posted_by_campaign_owner%2Chas_ti_violation%2Cmoderation_status%2Cpost_level_suspension_removal_date%2Cpls_one_liners_by_category%2Cvideo%2Cvideo_preview%2Cview_count%2Ccontent_unlock_options%2Cis_new_to_current_user%2Cwatch_state&fields[media]=id%%2Cfile_name&filter[campaign_id]=${CAMPAIGN_ID}&filter[is_draft]=false&filter[contains_exclusive_posts]=true&sort=-published_at&json-api-use-default-includes=false&json-api-version=1.0`, page = 0;
while (true) {
    console.info(`Fetching page ${++page}`);

    const response = await fetch(url);
    if (!response.ok) {
        throw new Error(`HTTP error! status: ${response.status}`);
    }

    const { data, included, links } = await response.json();
    for (const { id, relationships } of data) {
        if (!relationships) {
            console.warn(`Post ${id} has no relationships`);
            continue;
        }

        const post = { attachments: [], attachments_media: [] };
        for (const entry of relationships.attachments.data) {
            const inc = included.find(x => x.id == entry.id);
            if (!inc) {
                throw new Error(`Attachment not found: ${entry.id}`);
            }
            post.attachments.push({ id: inc.id, name: inc.attributes.name });
        }
        for (const entry of relationships.attachments_media.data) {
            const inc = included.find(x => x.id == entry.id);
            if (!inc) {
                throw new Error(`Attachment not found: ${entry.id}`);
            }
            post.attachments_media.push({ id: inc.id, name: inc.attributes.file_name });
        }
        if (post.attachments.length || post.attachments_media.length) {
            posts[id] = post;
        }
    }

    if (!links?.next) {
        console.info('No more pages to fetch');
        break;
    }
    url = links.next;
}

console.info('All done!');
console.info(JSON.stringify(posts));

Press F12 and paste the modified script into the Console, hit enter, and wait for it to complete. Expect to see a JSON output after the All Done line. Copy the output including the { and } characters (It's empty here because this artist doesn't use Patreon attachments):

image

Step 2: Rename files

Get NodeJS, copy the following script and save it as <give it a name>.js. Fill in the ROOT_DIR and paste the JSON you copied above. Also, make sure to change makePath to match your PatreonDownloader output format.

const path = require('path'), fs = require('fs');

// Change the ROOT_DIR to the correct path
const ROOT_DIR = 'FILL_YOUR_STORAGE_ROOT';

// Paste the output of POSTS JSON here
const POSTS = PASTE_THE_JSON_FROM_STEP_1;

for (const id in POSTS) {
    // Depending on your PatreonDownloader configuration, modify the function below to make sure it constructs the correct path
    const makePath = (entry) => path.join(ROOT_DIR, id, `attachment_${entry.id}_${entry.name}`);
    // const makePath = (entry) => path.join(ROOT_DIR, `${id}_attachment_${entry.id}_${entry.name}`);

    for (const newAttachment of POSTS[id].attachments_media) {
        const old = POSTS[id].attachments.find(a => a.name === newAttachment.name);
        if (!old) {
            console.error(`Unable to find old attachment for #${newAttachment.id} ${newAttachment.name}`);
            continue;
        }

        const newPath = makePath(newAttachment);
        if (fs.existsSync(newPath)) {
            console.info(`New attachment already exist: ${newPath}`);
            continue;
        }

        const oldPath = makePath(old);
        if (!fs.existsSync(oldPath)) {
            console.error(`Old attachment not exist: ${oldPath}`);
            continue;
        }

        console.info(`Renaming: ${oldPath} -> ${newPath}`);
        fs.renameSync(oldPath, newPath);
    }
}

Run the script via NodeJS (Execute node <your script>.js in cmd or whatever), and check the output. Ideally, it should output these lines:

Renaming: XXXX\YYYYY\attachment_205ZZZZ_XXXXX.zip -> XXXX\YYYYY\attachment_316ZZZZZ_XXX.zip

xWTF avatar Oct 23 '24 03:10 xWTF