python-twitter icon indicating copy to clipboard operation
python-twitter copied to clipboard

Tweet serialization & deserialization looses (at least) data about media

Open M4rtinK opened this issue 7 years ago • 2 comments

I've recently implemented tweet caching in my Twitter app, based on the AsDict() for serialization & NewFromJsonDict() for deserialization.

But I've noticed an inconsistency - if the serialized tweet has media, the media description is correctly serialized to the dict by AsDict(), but when the tweet is deserialized with NewFromJsonDict() the media property of the new Status instance is None.

I've also tried passing the dict to the Status constructor instead of kwargs, but the media is not None, but instead contains a list of dicts describing the media, not the expected list Media instances.

This is a short reproducer demonstrating the issue:

#!/usr/bin/python3

import twitter

as_dict_output = {'id_str': '874353511328169984', 'media': [{'media_url': 'http://pbs.twimg.com/media/DCJTgZ_UIAAwFnw.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url':
 'pic.twitter.com/JDv3Iz9L54', 'expanded_url': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJTgZ_UIAAwFnw.jpg', '
sizes': {'medium': {'resize': 'fit', 'w': 1200, 'h': 1149}, 'large': {'resize': 'fit', 'w': 2048, 'h': 1960}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}, 'small': {'resize': 'fit', 'w'
: 680, 'h': 651}}, 'id': 874353093860663296}, {'media_url': 'http://pbs.twimg.com/media/DCJThr7UwAE9rCo.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url': 'pic.twitter.
com/JDv3Iz9L54', 'expanded_url': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJThr7UwAE9rCo.jpg', 'sizes': {'medi
um': {'resize': 'fit', 'w': 1200, 'h': 1114}, 'large': {'resize': 'fit', 'w': 2048, 'h': 1901}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}, 'small': {'resize': 'fit', 'w': 680, 'h': 63
1}}, 'id': 874353115855634433}, {'media_url': 'http://pbs.twimg.com/media/DCJTi68UQAA1ZPE.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url': 'pic.twitter.com/JDv3Iz9L54
', 'expanded_url': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJTi68UQAA1ZPE.jpg', 'sizes': {'medium': {'resize'
: 'fit', 'w': 1134, 'h': 1200}, 'small': {'resize': 'fit', 'w': 643, 'h': 680}, 'large': {'resize': 'fit', 'w': 1935, 'h': 2048}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}}, 'id': 874
353137066196992}, {'media_url': 'http://pbs.twimg.com/media/DCJTjY3UwAQbErC.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url': 'pic.twitter.com/JDv3Iz9L54', 'expanded_u
rl': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJTjY3UwAQbErC.jpg', 'sizes': {'medium': {'resize': 'fit', 'w':
1200, 'h': 1149}, 'large': {'resize': 'fit', 'w': 2048, 'h': 1960}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}, 'small': {'resize': 'fit', 'w': 680, 'h': 651}}, 'id': 87435314509832192
4}], 'urls': [], 'favorited': True, 'full_text': '19 years ago today on June 12, 1998, Shuttle-Mir program ended when space shuttle Discovery landed with Mir crew member Andy Thomas. https:
//t.co/JDv3Iz9L54', 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'user': {'following': True, 'profile_background_image_url': 'http://pbs.twimg.com/profile
_background_images/517439388741931008/iRbQw1ch.jpeg', 'location': 'Low Earth Orbit', 'utc_offset': -18000, 'profile_background_color': 'C0DEED', 'description': "NASA's page for updates from
 the International Space Station, the world-class lab orbiting Earth 250 miles above. For the latest research, follow @ISS_Research.", 'favourites_count': 5099, 'id': 1451773004, 'profile_i
mage_url': 'http://pbs.twimg.com/profile_images/822552192875892737/zO1pmxzw_normal.jpg', 'profile_sidebar_fill_color': 'DDEEF6', 'screen_name': 'Space_Station', 'time_zone': 'Central Time (
US & Canada)', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1451773004/1497549511', 'friends_count': 234, 'verified': True, 'profile_text_color': '333333', 'profile_link_col
or': '0084B4', 'url': 'https://t.co/9Gk2H0gekn', 'created_at': 'Thu May 23 15:25:28 +0000 2013', 'name': 'Intl. Space Station', 'listed_count': 7602, 'followers_count': 1635947, 'statuses_c
ount': 6976, 'lang': 'en'}, 'favorite_count': 1260, 'created_at': 'Mon Jun 12 19:51:36 +0000 2017', 'hashtags': [], 'user_mentions': [], 'retweet_count': 415, 'retweeted': True, 'lang': 'en
', 'id': 874353511328169984}

print("tweet dict from AsDict() has media:")
print(as_dict_output["media"])



new_from_json_tweet = twitter.models.Status.NewFromJsonDict(as_dict_output)
print("tweet deserialized by NewFromJson doesn't have media:")
twitter.models.Status.NewFromJsonDict(as_dict_output)
print(new_from_json_tweet.media)

kwargs_tweet = twitter.models.Status(**as_dict_output)
print("tweet created by passing the dict instead of kwargs has media:")
print(kwargs_tweet.media)
print("but it's just a list of dicts, not a list of twitter.models.Media instances")

Looking at the serialization/deserialization code in models.py it seems the issue is caused by the serialization code putting the dicts representing the media instances to key called media but then in NewFromJsonDict() it's looking for media in a dict in the "entities" or "extended_entities" key, not in the toplevel dict namespace:

        if 'entities' in data:
            if 'urls' in data['entities']:
                urls = [Url.NewFromJsonDict(u) for u in data['entities']['urls']]
            if 'user_mentions' in data['entities']:
                user_mentions = [User.NewFromJsonDict(u) for u in data['entities']['user_mentions']]
            if 'hashtags' in data['entities']:
                hashtags = [Hashtag.NewFromJsonDict(h) for h in data['entities']['hashtags']]
            if 'media' in data['entities']:
                media = [Media.NewFromJsonDict(m) for m in data['entities']['media']]

        # the new extended entities
        if 'extended_entities' in data:
            if 'media' in data['extended_entities']:
                media = [Media.NewFromJsonDict(m) for m in data['extended_entities']['media']]

So these possible solutions come to me mind:

  • AsDict() should place the list of media dicts to ["entities"]["media"] or ["extended_entitites"]["media"] instead to the toplevel "media" key
  • NewFromJson() should look for media also in the top-level "media" key

M4rtinK avatar Jun 17 '17 21:06 M4rtinK

For storage and caching, I'd use the twitter.Status()._json attribute since that is straight from twitter. I'll look into this, but I don't think it will ever be a true one to one from dict -> dict since there are going to be things like empty lists (hashtags etc.) that don't get set on the NewFromJson method that are present (even if they're empty) on the original AsDict (if that makes sense).

jeremylow avatar Mar 08 '18 12:03 jeremylow

I'm also seeing no media being populated in the Status. 3 years after this was opened, is there a way forward?

lifenautjoe avatar Oct 04 '20 20:10 lifenautjoe