the-blue-alliance icon indicating copy to clipboard operation
the-blue-alliance copied to clipboard

Chief Delphi Migration Tracking Issue

Open bdaroz opened this issue 5 years ago • 8 comments

Chief Delphi is currently migrating from vBulletin to Discourse.

This does break all existing cdphotothread team media objects, but we did download and back the files up prior to the migration.

When the migration is complete we will need to:

  • [ ] Determine if we can readily scrape new CD posts for attachment information
  • [ ] Determine if we can map the old cdphotothread elements by old photo id to new thread id and file snippit
  • [ ] Update the URL prefixes in code: https://github.com/the-blue-alliance/the-blue-alliance/blob/7f0783285798349fd29ff42853b61ee4ed2fddc5/models/media.py#L100 if applicable
  • [ ] Or if the new CD site isn't readily parsable migrate the existing images to another location and relink them.

bdaroz avatar Dec 29 '18 17:12 bdaroz

Hotfixed CD media to link to archive.org images in https://github.com/the-blue-alliance/the-blue-alliance/pull/2364 until we figure out a longer term solution (I bet the apps are still broken though)

phil-lopreiato avatar Dec 31 '18 22:12 phil-lopreiato

Any progress on this @bdaroz?

JonathanLindsey avatar Sep 27 '20 02:09 JonathanLindsey

Looks like this is getting some new attention from this CD thread: https://www.chiefdelphi.com/t/posting-a-robot-image-on-chiefdelphi-and-linking-to-it-from-thebluealliance/388148

A few observations I noticed:

  • Media suggestion page still suggests CD photos - this should be removed until there is a solution implemented
  • Entering a link following the format of old CD-media links (e.g. https://www.chiefdelphi.com/media/photos/12345) returns a 500 error rather than the bad URL error. This can probably be fixed by removing some lines of code around here) so that we don't attempt to parse a URL which we can't parse
  • If someone is willing and able to build this, the media URL is not hard to find. Basically, find the first post of a thread (via <div class='post' itemprop='articleBody'>) and then grab the first image out of a lightbox (via <div class="lightbox-wrapper"><a class="lightbox" href="https://www.chiefdelphi.com/uploads/default/original/...")

jaredhasenklein avatar Sep 27 '20 05:09 jaredhasenklein

I believe there was a conversation in slack at one point that because "new" CD Media are essentially regular posts with attachments (which can have 0 or more items, 0 or more of which can be images) and those attachments can be either in-line, or attached, or both, supporting "new" CD Media going forward was overly problematic.

bdaroz avatar Sep 27 '20 16:09 bdaroz

What about a different approach? I think most users are capable of getting the direct image URL pretty easily these days (e.g. right click, copy image address). We could accept image files ending with .jpg, .png, and whatever else CD supports if the URL begins with chiefdelphi.com since we know those URLs are stable.

jaredhasenklein avatar Sep 27 '20 16:09 jaredhasenklein

I didn’t think this was blocked? @bdaroz had the mappings between the old -> new CD URLs and this just fell through? I haven’t seen code for any attempts at this anywhere either. Maybe I missed the conversation where we decided it was too much work.

It’d be a shame to have collected years worth of CD Media to say it’s too much work to support after the forum migrated.

ZachOrr avatar Sep 27 '20 19:09 ZachOrr

We had done the mappings, but supporting the new forum format for the multi-image posts was not anywhere near a simple regexp replacement.

bdaroz avatar Sep 27 '20 19:09 bdaroz

Coming back around to this issue. Apologies if it got dropped.

The issue at the time with the CD vBulletin->Discourse migration was two fold: 1. The old links were dead, 2. New links made it very difficult to find the attachment image as opposed to a poster's avatar image.

While we did have a way to map old threads to new threads, the parsing testing that was done with the new threads left much to be desired and this issue back burnered. (Admittedly way, way back burnered).

In revisiting this now there appears to be a way to far more reliably find the intended "attached" image in the thread. This seems to work both on new threads, and old threads, but only for the first attached image in the thread. There now exists in the return HTML from the thread an HTML meta property with the name og:image with content that contains a direct URL to the first attachment.

This doesn't solve multi-image threads, but, perhaps there is another way. My current line of thinking is this - and this is what I would propose we implement soon(TM) to cover new CD media, using new media keys.

  • For URLs in the form https://www.chiefdelphi.com/t/... we can retrieve the URL, parse it for the meta property og:image and use the content for the image deeplink. We can then use the original /t/... URL to link to the thread, if so desired.
  • Allow users to give us direct deep-link image URLs in the form https://www.chiefdelphi.com/uploads/... that do not require any HTML parsing and can be used directly, but without a link to the CD Thread.

Once support is reestablished for /t/... URLs we can use that foundation to migrate the old media. I'm attaching the mapping file for that next step when we're ready.

media.csv From my notes, the first column is the cdmphoto or cdmpaper ID that we are storing, the 2nd column is a link to the new thread ID {ID} used in the form of https://www.chiefdelphi.com/t/{ID}.

bdaroz avatar Apr 06 '22 02:04 bdaroz