[Question]: Json serialization inquiries
Question
Is there a particular reason for why when saving articles to json (for example by specifying save_to_file in the crawling or by using the Article.to_json() method) things like the URL of the articles and/or the specific publisher the articles came from are not included? Or maybe they are and I am being mistaken?
In any case, I was also wondering if the Image attribute is also already supported in the serialization, because I've got some errors when trying to serialize articles containing that attribute (but I am not filing it as a bug cause I am not sure whether it was a mistake in my devel env)
@ruggsea Thanks for catching that. While the Image object is serializable, we missed a bug in the articles' to_json method causing these issues. I will work on a fix for this.
@ruggsea I opened a "quick" fix for the image serialization within articles.
Is there a particular reason for why when saving articles to json (for example by specifying save_to_file in the crawling or by using the Article.to_json() method) things like the URL of the articles and/or the specific publisher the articles came from are not included? Or maybe they are and I am being mistaken?
No, you're right. They are not included. The HTML object is currently not serialized and while spending some time with this issue I realized the best way here would be to rewrite the serialization and maybe switch to pydantic for that reason. I will continue working on this issue, but for now, the HTML object is not serializable.