mem0
mem0 copied to clipboard
Unable to capture precise publication time for YouTube video metadata
🐛 Describe the bug
Embedchain currently captures the publish date of YouTube videos but omits the precise publication time. The metadata stored for each video includes the date in the format YYYY-MM-DD 00:00:00, which defaults to a time of 00:00:00, indicating that the time component is not being processed or stored.The issue with just the date is because of time zones.
Example of the issue: For a video published at 3 PM on April 15, 2024, Embedchain stores the publish date as 2024-04-15 00:00:00 Whereas it should be in ISO 8601 format ( example: 2024-04-15T14:30:00Z ) This format allows the publication time to be used globally without confusion about time zone differences.
Can I work on this?
@Dev-Khant Has the migration from pytube to yt-dlp been merged? Because if so, yt-dlp returns metadata such as publication time and this will be a very easy fix then. I could do it
@Dev-Khant Has the migration from pytube to yt-dlp been merged? Because if so, yt-dlp returns metadata such as publication time and this will be a very easy fix then. I could do it
No sorry it has not been merged yet. But please feel free to open a PR for this using yt-dlp.
@Dev-Khant Has the migration from pytube to yt-dlp been merged? Because if so, yt-dlp returns metadata such as publication time and this will be a very easy fix then. I could do it
No sorry it has not been merged yet. But please feel free to open a PR for this using yt-dlp.
@Dev-Khant, I was writing the script for yt-dlp migration and realized that although the youtube dataloader does return meta data for videos, the LLM does not seem to have access to the video metadata. It only seems to have access to the video context.
I ran these 4 queries: 1- what is the publication date of the video? 2- do you have any information regarding this video's metadata? 3- what is the context you were provided with? 4- what is the name of the youtube channel and the length of the video?
here are the results:
I was wondering if this is by design or an issue?
@MoizKhuzema If you use citations==True
in app.query() you will get metadata which will return you all the metadata regarding that video. It will include publish date, author, title etc.
So it's by design that you get metadata separately.
@MoizKhuzema If you use
citations==True
in app.query() you will get metadata which will return you all the metadata regarding that video. It will include publish date, author, title etc.So it's by design that you get metadata separately.
Understood, thanks
@Dev-Khant is this issue open for contribution?
@MoizKhuzema Please let us know if you are working on this or else @02shanks would like to pick this up. Thanks.
I would like to pick this up as I see it's still open.