AlgoliaAPI release vs modified date
Are you scraping a scene, gallery, movie, or performer?
- Movie
Scrape with URL? If so, what URLs have you tried?
- https://www.genderxfilms.com/en/movie/Transsexual-Hitchhikers-4/123876
- https://www.genderxfilms.com/en/movie/Family-Transformation/77977
- https://www.genderxfilms.com/en/movie/Genderx-Initiations/121227 (Can provide more if needed)
Scraper is scraping incorrect dates.
This scraper uses the Algolia API, which has different information than is present on the page. In the case of the dates, the API is providing three different dates, "Created", "Upcoming" and "Last Modified". The Algolia scraper is selecting the "created" date, which might be the most appropriate as this is, hopefully, when the movie was first published or released. The site is showing, probably, the last mofidied date.
Discussion may be required to determine the desired path to take.
25-04-13 13:34:36 Debug
[Scrape / GenderX Films] Dates available: upcoming 2025-03-13 - created 2024-08-05 - last modified 2025-03-13
2025-04-13 13:34:36 Debug
[Scrape / GenderX Films] Scraping movie
2025-04-13 13:34:36 Debug
[Scrape / GenderX Films] URL Scraping: https://www.genderxfilms.com/en/movie/Transsexual-Hitchhikers-4/123876
notabaug, will wait for admin decision and discussion
The created date may be when the movie (empty collection, or perhaps with the first scene) is first published, and then after all the scenes are individually published (often on a weekly basis) then the last modified date might be updated to match the latest scene... not sure about upcoming, possibly that's a forecasted date of the final/last scene publish date.
It would probably be useful to look at a new movie release that doesn't yet have all its scenes published. The scenes are normally listed on the movie page with future dates. Then perhaps the three date values could be observed throughout the scene publishing lifecycle of the movie.
Last Modified I don't think can be our answer, the question kinda lies in how they have it distributed, ie is it considered distributed when the first one hits the web or the last one, or if it's completely seperate with web vs physical?
Ok, let's look at some examples in Algolia
Evil Angel
upcoming movies
https://www.evilangel.com/en/movie/Kimber-James--Chris-Epic/129147
"date_created": "2025-04-14",
"nb_of_scenes": 3,
"last_modified": null,
"upcoming": "2025-04-24",
- all 3 scenes have future date 2025-04-24
https://www.evilangel.com/en/movie/Pleasure-Vixens-03/129007
"date_created": "2025-04-07",
"nb_of_scenes": 4,
"last_modified": null,
"upcoming": "2025-04-24",
- scene 1: 2025-04-24
- scene 2: 2025-04-26
- scene 3: 2025-04-28
- scene 4: 2025-04-30
so this upcoming movie currently has an upcoming date of the first scene
latest movies
https://www.evilangel.com/en/movie/Crossing-Borders-02/128218
"date_created": "2025-02-25",
"nb_of_scenes": 12,
"last_modified": "2025-04-22",
"upcoming": "2025-04-22",
- scene 1: 2025-03-25
- scene 12: 2025-04-22
here we can see the upcoming date is the date of the last scene
theory
for one of the above upcoming examples, https://www.evilangel.com/en/movie/Pleasure-Vixens-03/129007, it currently has:
"date_created": "2025-04-07",
"nb_of_scenes": 4,
"last_modified": null,
"upcoming": "2025-04-24",
as the scenes are released on the dates:
- scene 1: 2025-04-24
- scene 2: 2025-04-26
- scene 3: 2025-04-28
- scene 4: 2025-04-30
the last_modified and upcoming values can be observed and noted.
I would guess that the upcoming is initially the date of the first scene, and then is updated on either:
- each scene release, or
- the final scene release to end up being the same value as the date of the final release
Also, I would guess the last_modified to behave as it sounds, with it being initially null when the movie is created (but with all scenes yet to be published), and then updated when each scene is published.
GenderX Films
latest movies
https://www.genderxfilms.com/en/movie/Couples-Loving-Trans-2/126333
"date_created": "2024-11-18",
"nb_of_scenes": 4,
"last_modified": "2025-04-17",
"upcoming": "2025-04-24",
date shown on page: 2025-04-17
scenes:
- 1: 2025-04-17
- 2: 2025-04-24
- 3: 2025-05-01
- 4: 2025-05-08
Here we can see that as of today (2025-04-22), only scene 1 is published on 2025-04-17, and the last_modified date is as you would expect, the date of the last scene being published, 2025-04-17
The upcoming date value is the date of the next scene to be published, scene 2, which makes sense in a way, in that the movie's next upcoming date is the date of the upcoming publishing of the next scene.
This shows that the upcoming date will:
- have a starting value of the first scene publishing date
- be updated to the next scene publishing date
- end up as the final scene publishing date
https://www.genderxfilms.com/en/movie/Trans-Campers/118787
"date_created": "2024-01-10",
"nb_of_scenes": 4,
"last_modified": "2024-06-13",
"upcoming": "2024-06-13",
date shown on page: 2024-06-13
- scene 1: 2024-05-16
- scene 4: 2024-06-13
Conclusion
Algolia appears to use the date fields date_created, last_modified, and upcoming in a fairly logical way.
API date field usage
date_created: this appears to be when the movie is first added to the API
last_modified: this is initially null, until the first scene is published, at which point the value matches the most recently published scene
upcoming: this is initially the date of the first scene, until the first scene is published, at which point it matches the next scene (unless, obviously, there is no next scene, so it would then just remain matching the final scene)
page date usage
This appears to use the upcoming date.
When the movie does not yet have any scenes published, this will be the date of the first scene. When the movie is part way published, it will be the date of the next (upcoming, future publish date) scene. When the movie has all scenes published, it will be the date of the final scene
scraping date implications
Personally, I would agree with the page's usage of the upcoming date, as I would consider a movie to only be truly published, when all of its scenes have been published.
This means that when a movie has some scenes in the future, the date will be "in flux" and tracking the next scene, so you would have to be mindful of this and rescrape the movie after the final scene has been published.
A possible (rather convoluted) solution would be to determine the scenes of a movie, and look at the final scene's release_date, and use that as the movie's scraped date.
Even though a scene has, e.g.
"release_date": "2025-04-24",
"upcoming": 1,
"movie_id": 126333,
"movie_title": "Couples Loving Trans 2",
"movie_desc": "",
"movie_date_created": "2024-11-18",
The movie was "created" on 2024-11-18, but scene 2 is not even available yet. I would say the movie_date_created (in an API scene, which is the same as an API movie's date_created) is just the date that the movie was added to the API, almost like a placeholder, until the scenes are all published.
You can see in the examples above that a movie is often created in the API (with all scenes with future publishing dates) several months before the scene release schedule begins.
I would say that the Algolia scraper(s) (I currently have one in a branch) should be updated to use upcoming for the movie date as that is more closely related to publishing in my opinion as it is when all the scenes are published and therefore the movie is fully available to watch.
For the little mention of the "convoluted" solution to determining a movie's final publishing date, the scenes of a movie can be looked up in Algolia API like this:
e.g. for https://www.genderxfilms.com/en/movie/Couples-Loving-Trans-2/126333
scenes for movie id 126333:
curl --location 'https://TSMKFA364Q.algolia.net/1/indexes/all_scenes/query' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'x-algolia-api-key: ****' \
--header 'x-algolia-application-id: ****' \
--data '{
"params": "hitsPerPage=20&page=0&query=",
"facetFilters": ["movie_id:126333"]
}'
response (edited for brevity):
{
"hits": [
{
"clip_id": 255847,
"title": "Couples Loving Trans 2 - Scene 4",
"release_date": "2025-05-08",
"upcoming": 1
},
{
"clip_id": 255846,
"title": "Couples Loving Trans 2 - Scene 3",
"release_date": "2025-05-01",
"upcoming": 1
},
{
"clip_id": 255845,
"title": "Couples Loving Trans 2 - Scene 2",
"release_date": "2025-04-24",
"upcoming": 1
},
{
"clip_id": 255844,
"title": "Couples Loving Trans 2 - Scene 1",
"release_date": "2025-04-17",
"upcoming": 0
}
],
"nbHits": 4
}
This list of scenes could easily be processed to find the final scene, e.g. in python:
final_scene_date = max([ scene["release_date"] for scene in api_response["hits"] ])
# 2025-05-08
With the final scene date known throughout the movie lifecycle (pre-release, during scene publishing, after final scene published), the final_scene_date value, extracted in the above example code, could be used for the date that will show on a movie's web page when the final scene has been published, even if the movie's scenes have not yet been published.
my personal opinion:
We should take the earliest date just for ease, it's the easiet to pull, it will be the most consistent and it makes the most sense logically when grouping up scenes, instead of going back to front with releases, you can go front-to-back
Moving discussion to discourse https://discourse.stashapp.cc/t/algolia-release-modified-dates/1951