fundus icon indicating copy to clipboard operation
fundus copied to clipboard

[Question]: Json serialization inquiries

Open ruggsea opened this issue 11 months ago • 2 comments

Question

Is there a particular reason for why when saving articles to json (for example by specifying save_to_file in the crawling or by using the Article.to_json() method) things like the URL of the articles and/or the specific publisher the articles came from are not included? Or maybe they are and I am being mistaken?

In any case, I was also wondering if the Image attribute is also already supported in the serialization, because I've got some errors when trying to serialize articles containing that attribute (but I am not filing it as a bug cause I am not sure whether it was a mistake in my devel env)

ruggsea avatar Feb 05 '25 14:02 ruggsea

@ruggsea Thanks for catching that. While the Image object is serializable, we missed a bug in the articles' to_json method causing these issues. I will work on a fix for this.

MaxDall avatar Feb 05 '25 14:02 MaxDall

@ruggsea I opened a "quick" fix for the image serialization within articles.

Is there a particular reason for why when saving articles to json (for example by specifying save_to_file in the crawling or by using the Article.to_json() method) things like the URL of the articles and/or the specific publisher the articles came from are not included? Or maybe they are and I am being mistaken?

No, you're right. They are not included. The HTML object is currently not serialized and while spending some time with this issue I realized the best way here would be to rewrite the serialization and maybe switch to pydantic for that reason. I will continue working on this issue, but for now, the HTML object is not serializable.

MaxDall avatar Feb 11 '25 18:02 MaxDall