flask-restless icon indicating copy to clipboard operation
flask-restless copied to clipboard

Painfully slow responses with large dataset and relationships (?)

Open AlmightyOatmeal opened this issue 8 years ago • 19 comments

I'm running into a serious performance problem with flask-restless that is making me look at alternatives and that breaks my heart. The posts table contains almost 700k "posts" for a blog with one-to-one mappings for raw JSON blobs and a many-to-many for tags; with getting only 10 results for each page, the request is taking 1-2 minutes!

I've already been through the database thinking that was an issue but it most certainly isn't. MySQL returns results in a second or two. Then I thought maybe it was SQLAlchemy so I went through that with a fine tooth comb but that wasn't the problem either; I can use flask-restful, flask-potion, or sqlalchemy by itself to generate the exact same result set in roughly 5 seconds or less.

The blog posts model:

tumbleweed_post_tags = db.Table('tumbleweed_post_tags',
    db.Column('id', db.Integer, primary_key=True, autoincrement=True),
    db.Column('post_id', db.Integer, db.ForeignKey('tumbleweed_post.id', ondelete='cascade'), index=True),
    db.Column('tag_id', db.Integer, db.ForeignKey('tumbleweed_tag.id', ondelete='cascade'), index=True),
    info={'bind_key': 'tumbleweed'}
)


class TumbleweedPost(db.Model, CustomCommon):
    __bind_key__ = 'tumbleweed'
    __tablename__ = 'tumbleweed_post'

    id = db.Column(db.Integer, primary_key=True)
    post_id = db.Column(db.String(convert_unicode=True, length=255, collation='utf8mb4_unicode_ci'), index=True, nullable=False, unique=True)
    # <...snip...>

    file_id = db.Column(db.Integer, db.ForeignKey('tumbleweed_post_file.id'))
    file = db.relationship(
        'TumbleweedPostFile',
        backref=db.backref('posts', lazy='dynamic'),
        foreign_keys=file_id
    )

    thumbnail_id = db.Column(db.Integer, db.ForeignKey('tumbleweed_post_thumbnail.id'))
    thumbnail = db.relationship(
        'TumbleweedPostThumbnail',
        backref=db.backref('post', lazy='dynamic'),
        foreign_keys=thumbnail_id
    )

    json_id = db.Column(db.Integer, db.ForeignKey('tumbleweed_post_json.id'))
    json = db.relationship(
        'TumbleweedPostJson',
        backref=db.backref('post', lazy='dynamic'),
        foreign_keys=json_id
    )

    tags = db.relationship(
        'TumbleweedTag',
        secondary=tumbleweed_post_tags,
        backref=db.backref('posts', lazy='dynamic'),
        lazy='dynamic'
    )

    def __init__(self, **kwargs):
        for c in self.__table__.columns:
            c_data = kwargs.pop(c.key, None)
            if c_data:
                setattr(self, c.key, c_data)

    def __repr__(self):
        return str('<{} id={}, post_id={}, blog_found={}>').format(self.__class__.__name__, self.id, self.post_id, self.blog_found)

The endpoint (all model endpoints are essentially the same for the time being due to testing):

restless_manager.create_api(
    tumbleweed_db.TumbleweedPost,
    url_prefix='/api/v1',
    collection_name='post',
    methods=['GET', 'POST', 'PUT', 'PATCH']
)

The request:

╭─jamie@BeefCurtainMoses  ~
╰─$ time curl -X GET -H 'Accept: application/vnd.api+json' 'http://0.0.0.0:7777/api/v1/post'
{
  "data": [
    <...snip...>
    {
      "attributes": {
        "adult": false,
        "blog_found": "<snip>",
        "blog_source": "<snip>",
        "downloaded": true,
        "post_id": "<snip>"
      },
      "id": "10",
      "links": {
        "self": "http://0.0.0.0:7777/api/v1/post/10"
      },
      "relationships": {
        "file": {
          "data": {
            "id": "10",
            "type": "post_file"
          },
          "links": {
            "related": "/api/v1/post/10/file",
            "self": "/api/v1/post/10/relationships/file"
          }
        },
        "json": {
          "data": {
            "id": "10",
            "type": "post_json"
          },
          "links": {
            "related": "/api/v1/post/10/json",
            "self": "/api/v1/post/10/relationships/json"
          }
        },
        "tags": {
          "data": [],
          "links": {
            "related": "/api/v1/post/10/tags",
            "self": "/api/v1/post/10/relationships/tags"
          }
        },
        "thumbnail": {
          "data": {
            "id": "9",
            "type": "post_thumbnail"
          },
          "links": {
            "related": "/api/v1/post/10/thumbnail",
            "self": "/api/v1/post/10/relationships/thumbnail"
          }
        }
      },
      "type": "post"
    }
  ],
  "included": [],
  "jsonapi": {
    "version": "1.0"
  },
  "links": {
    "first": "http://0.0.0.0:7777/api/v1/post?page[number]=1&page[size]=10",
    "last": "http://0.0.0.0:7777/api/v1/post?page[number]=68368&page[size]=10",
    "next": "http://0.0.0.0:7777/api/v1/post?page[number]=2&page[size]=10",
    "prev": "http://0.0.0.0:7777/api/v1/post?page[number]=0&page[size]=10",
    "self": "/api/v1/post"
  },
  "meta": {
    "total": 683672
  }
}
curl -X GET -H 'Accept: application/vnd.api+json'   0.00s user 0.01s system 0% cpu 1:06.46 total

And the logs:

2016-07-06 10:22:38,403 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2016-07-06 10:22:38,404 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post.id AS tumbleweed_post_id, tumbleweed_post.post_id AS tumbleweed_post_post_id, tumbleweed_post.blog_source AS tumbleweed_post_blog_source, tumbleweed_post.blog_found AS tumbleweed_post_blog_found, tumbleweed_post.downloaded AS tumbleweed_post_downloaded, tumbleweed_post.adult AS tumbleweed_post_adult, tumbleweed_post.file_id AS tumbleweed_post_file_id, tumbleweed_post.thumbnail_id AS tumbleweed_post_thumbnail_id, tumbleweed_post.json_id AS tumbleweed_post_json_id 
FROM tumbleweed_post ORDER BY tumbleweed_post.id ASC 
 LIMIT %s, %s
2016-07-06 10:22:38,404 INFO sqlalchemy.engine.base.Engine (0, 10)
2016-07-06 10:22:38,448 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1 
FROM (SELECT tumbleweed_post.id AS tumbleweed_post_id, tumbleweed_post.post_id AS tumbleweed_post_post_id, tumbleweed_post.blog_source AS tumbleweed_post_blog_source, tumbleweed_post.blog_found AS tumbleweed_post_blog_found, tumbleweed_post.downloaded AS tumbleweed_post_downloaded, tumbleweed_post.adult AS tumbleweed_post_adult, tumbleweed_post.file_id AS tumbleweed_post_file_id, tumbleweed_post.thumbnail_id AS tumbleweed_post_thumbnail_id, tumbleweed_post.json_id AS tumbleweed_post_json_id 
FROM tumbleweed_post) AS anon_1
2016-07-06 10:22:38,448 INFO sqlalchemy.engine.base.Engine ()
2016-07-06 10:22:38,745 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:38,745 INFO sqlalchemy.engine.base.Engine (1L,)
2016-07-06 10:22:38,786 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:38,786 INFO sqlalchemy.engine.base.Engine (1L,)
2016-07-06 10:22:38,861 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:38,861 INFO sqlalchemy.engine.base.Engine (1L,)
2016-07-06 10:22:38,900 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:38,901 INFO sqlalchemy.engine.base.Engine (1L,)
2016-07-06 10:22:38,938 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:38,938 INFO sqlalchemy.engine.base.Engine (2L,)
2016-07-06 10:22:38,973 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:38,973 INFO sqlalchemy.engine.base.Engine (3L,)
2016-07-06 10:22:39,017 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:39,017 INFO sqlalchemy.engine.base.Engine (2L,)
2016-07-06 10:22:39,052 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:39,052 INFO sqlalchemy.engine.base.Engine (2L,)
2016-07-06 10:22:39,086 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:39,086 INFO sqlalchemy.engine.base.Engine (3L,)
2016-07-06 10:22:39,122 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:39,122 INFO sqlalchemy.engine.base.Engine (4L,)
2016-07-06 10:22:39,200 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:39,201 INFO sqlalchemy.engine.base.Engine (3L,)
2016-07-06 10:22:39,244 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:39,244 INFO sqlalchemy.engine.base.Engine (3L,)
2016-07-06 10:22:39,278 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:39,278 INFO sqlalchemy.engine.base.Engine (4L,)
2016-07-06 10:22:39,321 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:39,321 INFO sqlalchemy.engine.base.Engine (2L,)
2016-07-06 10:22:39,366 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:39,366 INFO sqlalchemy.engine.base.Engine (4L,)
2016-07-06 10:22:39,410 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:39,410 INFO sqlalchemy.engine.base.Engine (4L,)
2016-07-06 10:22:39,453 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:39,453 INFO sqlalchemy.engine.base.Engine (5L,)
2016-07-06 10:22:39,489 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:39,489 INFO sqlalchemy.engine.base.Engine (5L,)
2016-07-06 10:22:39,543 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:39,543 INFO sqlalchemy.engine.base.Engine (5L,)
2016-07-06 10:22:39,583 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:39,583 INFO sqlalchemy.engine.base.Engine (5L,)
2016-07-06 10:22:39,621 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:39,621 INFO sqlalchemy.engine.base.Engine (7L,)
2016-07-06 10:22:39,656 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:39,656 INFO sqlalchemy.engine.base.Engine (7L,)
2016-07-06 10:22:39,692 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:39,692 INFO sqlalchemy.engine.base.Engine (6L,)
2016-07-06 10:22:39,728 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:39,728 INFO sqlalchemy.engine.base.Engine (6L,)
2016-07-06 10:22:39,765 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:39,765 INFO sqlalchemy.engine.base.Engine (6L,)
2016-07-06 10:22:39,811 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:39,811 INFO sqlalchemy.engine.base.Engine (6L,)
2016-07-06 10:22:39,864 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:39,864 INFO sqlalchemy.engine.base.Engine (7L,)
2016-07-06 10:22:39,905 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:39,905 INFO sqlalchemy.engine.base.Engine (8L,)
2016-07-06 10:22:39,944 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:39,944 INFO sqlalchemy.engine.base.Engine (7L,)
2016-07-06 10:22:40,003 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:40,003 INFO sqlalchemy.engine.base.Engine (8L,)
2016-07-06 10:22:40,046 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:40,047 INFO sqlalchemy.engine.base.Engine (8L,)
2016-07-06 10:22:40,091 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:40,091 INFO sqlalchemy.engine.base.Engine (9L,)
2016-07-06 10:22:40,129 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:40,129 INFO sqlalchemy.engine.base.Engine (8L,)
2016-07-06 10:22:40,171 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:40,171 INFO sqlalchemy.engine.base.Engine (9L,)
2016-07-06 10:22:40,211 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:40,212 INFO sqlalchemy.engine.base.Engine (9L,)
2016-07-06 10:22:40,250 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_file.id AS tumbleweed_post_file_id, tumbleweed_post_file.name AS tumbleweed_post_file_name, tumbleweed_post_file.adult AS tumbleweed_post_file_adult, tumbleweed_post_file.full_path AS tumbleweed_post_file_full_path, tumbleweed_post_file.new_path AS tumbleweed_post_file_new_path, tumbleweed_post_file.source_url AS tumbleweed_post_file_source_url, tumbleweed_post_file.content_type AS tumbleweed_post_file_content_type, tumbleweed_post_file.downloaded AS tumbleweed_post_file_downloaded, tumbleweed_post_file.size AS tumbleweed_post_file_size, tumbleweed_post_file.error AS tumbleweed_post_file_error, tumbleweed_post_file.error_code AS tumbleweed_post_file_error_code, tumbleweed_post_file.json_id AS tumbleweed_post_file_json_id 
FROM tumbleweed_post_file 
WHERE tumbleweed_post_file.id = %s
2016-07-06 10:22:40,250 INFO sqlalchemy.engine.base.Engine (10L,)
2016-07-06 10:22:40,286 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_thumbnail.id AS tumbleweed_post_thumbnail_id, tumbleweed_post_thumbnail.name AS tumbleweed_post_thumbnail_name, tumbleweed_post_thumbnail.data_base64 AS tumbleweed_post_thumbnail_data_base64, tumbleweed_post_thumbnail.source_url AS tumbleweed_post_thumbnail_source_url, tumbleweed_post_thumbnail.content_type AS tumbleweed_post_thumbnail_content_type, tumbleweed_post_thumbnail.size AS tumbleweed_post_thumbnail_size, tumbleweed_post_thumbnail.error AS tumbleweed_post_thumbnail_error, tumbleweed_post_thumbnail.error_code AS tumbleweed_post_thumbnail_error_code 
FROM tumbleweed_post_thumbnail 
WHERE tumbleweed_post_thumbnail.id = %s
2016-07-06 10:22:40,286 INFO sqlalchemy.engine.base.Engine (9L,)
2016-07-06 10:22:40,335 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post_json.id AS tumbleweed_post_json_id, tumbleweed_post_json.json AS tumbleweed_post_json_json 
FROM tumbleweed_post_json 
WHERE tumbleweed_post_json.id = %s
2016-07-06 10:22:40,335 INFO sqlalchemy.engine.base.Engine (10L,)
2016-07-06 10:22:40,371 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_tag.id AS tumbleweed_tag_id, tumbleweed_tag.name AS tumbleweed_tag_name 
FROM tumbleweed_tag, tumbleweed_post_tags 
WHERE %s = tumbleweed_post_tags.post_id AND tumbleweed_tag.id = tumbleweed_post_tags.tag_id
2016-07-06 10:22:40,371 INFO sqlalchemy.engine.base.Engine (10L,)
2016-07-06 10:22:40,405 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post.id AS tumbleweed_post_id, tumbleweed_post.post_id AS tumbleweed_post_post_id, tumbleweed_post.blog_source AS tumbleweed_post_blog_source, tumbleweed_post.blog_found AS tumbleweed_post_blog_found, tumbleweed_post.downloaded AS tumbleweed_post_downloaded, tumbleweed_post.adult AS tumbleweed_post_adult, tumbleweed_post.file_id AS tumbleweed_post_file_id, tumbleweed_post.thumbnail_id AS tumbleweed_post_thumbnail_id, tumbleweed_post.json_id AS tumbleweed_post_json_id 
FROM tumbleweed_post ORDER BY tumbleweed_post.id ASC
2016-07-06 10:22:40,405 INFO sqlalchemy.engine.base.Engine ()
2016-07-06 10:23:44,436 INFO sqlalchemy.engine.base.Engine ROLLBACK
127.0.0.1 - - [06/Jul/2016 10:23:44] "GET /api/v1/post HTTP/1.1" 200 -
--------------------------------------------------------------------------------
PATH: '/api/v1/post'
         21979628 function calls (21967844 primitive calls) in 66.391 seconds

   Ordered by: internal time, call count
   List reduced from 1018 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       42   35.331    0.841   35.331    0.841 {method 'store_result' of '_mysql.connection' objects}
   683764    7.267    0.000    7.267    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/loading.py:478(_populate_full)
   683774    3.025    0.000   14.321    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/loading.py:360(_instance)
       42    2.262    0.054    2.262    0.054 {built-in method fetch_row}
       19    1.869    0.098   64.289    3.384 {map}
       42    1.750    0.042    1.750    0.042 {method 'query' of '_mysql.connection' objects}
   683672    1.376    0.000    7.705    0.000 /Library/Python/2.7/site-packages/Flask_Restless-1.0.0b2.dev0-py2.7.egg/flask_restless/views/base.py:1810(resources_to_include)
   683764    1.084    0.000    1.084    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/state.py:75(__init__)
   683794    1.004    0.000    1.004    0.000 {method 'issuperset' of 'frozenset' objects}
   683764    0.913    0.000    2.449    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/instrumentation.py:311(new_instance)
   683680    0.878    0.000    2.572    0.000 /Library/Python/2.7/site-packages/werkzeug/datastructures.py:276(get)
   683764    0.750    0.000    1.257    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/state.py:251(_cleanup)
   684202    0.686    0.000    1.069    0.000 /Library/Python/2.7/site-packages/werkzeug/local.py:160(top)
   683679    0.617    0.000    1.695    0.000 /Library/Python/2.7/site-packages/werkzeug/datastructures.py:393(__getitem__)
   683679    0.589    0.000    1.078    0.000 /Library/Python/2.7/site-packages/werkzeug/exceptions.py:99(__init__)
   683848    0.559    0.000    2.633    0.000 /Library/Python/2.7/site-packages/werkzeug/local.py:296(_get_current_object)
   683817    0.513    0.000   15.624    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/loading.py:30(instances)
   683835    0.501    0.000    1.665    0.000 /Library/Python/2.7/site-packages/flask/globals.py:17(_lookup_req_object)
   683679    0.488    0.000    0.488    0.000 /Library/Python/2.7/site-packages/werkzeug/exceptions.py:86(__init__)
   683845    0.424    0.000    3.759    0.000 /Library/Python/2.7/site-packages/werkzeug/local.py:340(__getattr__)
      129    0.410    0.003    0.600    0.005 /Library/Python/2.7/site-packages/sqlalchemy/util/_collections.py:748(unique_list)
   684268    0.409    0.000    0.410    0.000 {hasattr}
   683696    0.369    0.000    0.443    0.000 /Library/Python/2.7/site-packages/werkzeug/utils.py:68(__get__)
1369988/1368751    0.357    0.000    2.056    0.000 {getattr}
   683764    0.342    0.000    0.342    0.000 <string>:1(set)
   684206    0.336    0.000    0.384    0.000 /Library/Python/2.7/site-packages/werkzeug/local.py:68(__getattr__)
   683764    0.314    0.000    0.507    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/identity.py:190(_fast_discard)
   683803    0.292    0.000    0.292    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/identity.py:145(get)
   683764    0.283    0.000    0.283    0.000 /Library/Python/2.7/site-packages/sqlalchemy/orm/identity.py:140(_add_unpresent)
        1    0.255    0.255    0.318    0.318 /Library/Python/2.7/site-packages/MySQLdb/connections.py:62(__init__)


--------------------------------------------------------------------------------

I've tried dozens of permutations of model relationship adjustments which have had a little impact on the issue at hand. I'm really curious to where "{map}" is being called from; map() is used in a number of locations within flask-restless so I may need to do a little more performance profiling there.

But:

  • 1-2 minutes is proposterous for a query of only 10 results. I'm a firm believer, and advocate, of flask-restless so I will do anything within my power to help investigate.
  • The includes and excludes don't modify the query at all so what's the point of specifying what columns you want when flask-restless queries everything under the sun?
  • Along the lines of the aforementioned, how can I disable traversing relationships? Pre/post processing isn't going to help much here and includes/excludes are useless for modifying the query but modifying the database layout isn't going to help much considering there will still be a relationship present that will be followed (at least one level deep).
  • If one is going to follow the relationship regardless, why only post the object ID and reference URL and not return any other data for the relationship object? If posts are going to have a many-to-many relationship and have a "tags" array of tags, then why not just allow me to return the tag.name? Otherwise I will need to use REQ * NUM_TAGS calls back to the API or use the tag endpoint and search based on parameters of each post in the results -- neither are good when the work has already been half-way done.

If someone has some ideas then I am most certainly all ears! I'm hoping to get this knocked out relatively soon! Cheers!

AlmightyOatmeal avatar Jul 06 '16 15:07 AlmightyOatmeal

Thank you for your thorough issue report. The problem is entirely with Flask-Restless, not your database configuration.

1-2 minutes is proposterous for a query of only 10 results. I'm a firm believer, and advocate, of flask-restless so I will do anything within my power to help investigate.

The performance is atrocious because I had no idea what I was doing when I started this project, and before long people it had become used by way more people than I had ever anticipated :).

The includes and excludes don't modify the query at all so what's the point of specifying what columns you want when flask-restless queries everything under the sun?

This is a long-standing issue with Flask-Restless. The includes/excludes are applied at serialization time, not at query time. If you see an easy way to fix this, it would be a welcome pull request.

Along the lines of the aforementioned, how can I disable traversing relationships? Pre/post processing isn't going to help much here and includes/excludes are useless for modifying the query but modifying the database layout isn't going to help much considering there will still be a relationship present that will be followed (at least one level deep).

This is sort of the same as the problem in the first bullet point above: you want to specify includes/excludes at the database level. Is that right?

If one is going to follow the relationship regardless, why only post the object ID and reference URL and not return any other data for the relationship object?

This is so that we meet the requirements of the JSON API specification.

If posts are going to have a many-to-many relationship and have a "tags" array of tags, then why not just allow me to return the tag.name? Otherwise I will need to use REQ * NUM_TAGS calls back to the API or use the tag endpoint and search based on parameters of each post in the results -- neither are good when the work has already been half-way done.

This is certainly a tradeoff that comes with the structure provided by the JSON API specification. Maybe what you're looking for is Flask-Restless to treat the many-to-many relationship for tags via an association proxy as a list attribute of the post model instead of as a relationship. I would like to support that (see the failing test https://github.com/jfinkels/flask-restless/blob/master/tests/test_fetching.py#L1615) but don't know enough about database configuration to be able to write the tests for it. See issue #480.


I would like to be able to work on applying includes/excludes at the database query level instead of at the serialization level, but it would take a fair bit of work and I don't have the time at the moment. But this is definitely a high priority issue. Let me know how difficult you think it would be to solve this.

jfinkels avatar Jul 06 '16 17:07 jfinkels

@jfinkels, thank you for the in-depth response. I would have responded sooner but I wanted my response to be equally as useful.

The performance is atrocious because I had no idea what I was doing when I started this project, and before long people it had become used by way more people than I had ever anticipated :).

Kudos to you for starting something that has become a favorite amongst people like me! :)

I would like to support that (see the failing test https://github.com/jfinkels/flask-restless/blob/master/tests/test_fetching.py#L1615)

Scalar is invoking .one() and raising an exception if more than one result are found so maybe the name of the test is wrong? I see what is trying to be done there and it will fail from what i can see in helpers.py and it seems that any relationship is treated as a relationship regardless of relationship type.

I think there should be some configurable logic added here to:

  • Skip relationships altogether.
  • Follow all the relationship or just link to it (like is currently being done)
  • Follow certain types of relationships or follow explicit relationships (such as ignoring all of my relationships but following the 'tags' relationship)

And that shouldn't be too difficult to do, with or without using SQLAlchemy's inspector (I'm trying to think of any overhead reduction).

I would like to be able to work on applying includes/excludes at the database query level instead of at the serialization level, but it would take a fair bit of work and I don't have the time at the moment. But this is definitely a high priority issue.

I think performance would be a higher priority, but that's just my humble opinion. Using includes and excludes to modify a query should be relatively simple; for example, I wrote this POC that takes a list of strings, as column names, to exclude form my query:

# -*- coding: utf-8 -*-

from app import db
from app.mod_tumbleweed import tumbleweed_db


def primary_keys(obj):
    return [k.name for k in obj.__mapper__.primary_key]


def col_map(obj):
    return obj.__mapper__.c.keys()


def generate_query(obj, excludes=None, includes=None):
    pri_keys = primary_keys(obj)
    queryable_attrs = col_map(obj)

    if not excludes and not includes:
        return db.session.query(obj).limit(10)

    attrs_to_query = includes if isinstance(includes, list) else []

    for query_attr in queryable_attrs:
        in_exc = (True if query_attr in excludes else False) if excludes else False
        in_inc = (True if query_attr in includes else False) if includes else False
        in_atq = True if query_attr in attrs_to_query else False

        if in_exc and in_atq:
            raise AttributeError('{} is both excluded and included, you make sad panda cry.'.format(query_attr))

        # Keep the primary key(s) no matter what (?)
        if in_exc and query_attr not in pri_keys:
            continue

        attrs_to_query.append(query_attr)

    mmm = [(obj.__mapper__.c.get(c) if obj.__mapper__.c.get(c) is not None else obj.__mapper__._props.get(c))
           for c in attrs_to_query]

    ### A little more human-readable version of the aforementioned:
    # mmm = []
    # for c in attrs_to_query:
    #     print('c: {}'.format(c))
    #     if obj.__mapper__.c.get(c) is not None:
    #         mmm.append(
    #             obj.__mapper__.c.get(c)
    #         )
    #         continue
    #     mmm.append(
    #         # obj.__mapper__.class_manager.get(c)
    #         obj.__mapper__._props.get(c)
    #     )

    # return db.session.query(*mmm).limit(10)
    return generate_results(db.session.query(*mmm).limit(10), attrs_to_query)


def generate_results(query, cols):
    for result in query:
        yield dict(zip(cols, result))


def test_excludes(test_obj, excludes=None):
    print('=> Excludes')
    for result in generate_query(test_obj, excludes):
        print(result)
    print('')
    return None


def test_normal(test_obj):
    print('=> Normal')
    for result in generate_query(test_obj):
        print(result)
    print('')
    return None


def main():
    test_obj = tumbleweed_db.TumbleweedPost

    test_normal(test_obj)
    test_excludes(test_obj, ['thumbnail', 'thumbnail_id', 'tags'])


if __name__ == '__main__':
    main()

(NOTE: I was planning to incorporate "includes" as well but I haven't gotten around to that yet)

The query for test_normal() was:

2016-07-07 10:20:50,267 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2016-07-07 10:20:50,268 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post.id AS tumbleweed_post_id, tumbleweed_post.post_id AS tumbleweed_post_post_id, tumbleweed_post.blog_source AS tumbleweed_post_blog_source, tumbleweed_post.blog_found AS tumbleweed_post_blog_found, tumbleweed_post.downloaded AS tumbleweed_post_downloaded, tumbleweed_post.adult AS tumbleweed_post_adult, tumbleweed_post.file_id AS tumbleweed_post_file_id, tumbleweed_post.thumbnail_id AS tumbleweed_post_thumbnail_id, tumbleweed_post.json_id AS tumbleweed_post_json_id 
FROM tumbleweed_post 
 LIMIT %s
2016-07-07 10:20:50,268 INFO sqlalchemy.engine.base.Engine (10,)

And the query for test_excludes() was:

2016-07-07 10:20:50,301 INFO sqlalchemy.engine.base.Engine SELECT tumbleweed_post.id AS tumbleweed_post_id, tumbleweed_post.post_id AS tumbleweed_post_post_id, tumbleweed_post.blog_source AS tumbleweed_post_blog_source, tumbleweed_post.blog_found AS tumbleweed_post_blog_found, tumbleweed_post.downloaded AS tumbleweed_post_downloaded, tumbleweed_post.adult AS tumbleweed_post_adult, tumbleweed_post.file_id AS tumbleweed_post_file_id, tumbleweed_post.json_id AS tumbleweed_post_json_id 
FROM tumbleweed_post 
 LIMIT %s
2016-07-07 10:20:50,301 INFO sqlalchemy.engine.base.Engine (10,)

While the results, with excludes, won't necessarily be the objects themselves, I did try to preserve the primary keys so they could be referenced by key instead of trying to do any other kind of table scan (or an index scan on a non-pkey index).


As far as the overall performance, this is really the first time I've looked under the skirt of flask-restless but a weekend and a copious supply of Red Bull (blueberry++, mmm) could yield some benefit. :)

Feel free to reach out to me via email or google chat/hangouts at amFtaWUuaXZhbm92QGdtYWlsLmNvbQo= (base64) or amFtaWUuaXZhbm92QGljbG91ZC5jb20K (base64) (both work with iMessage) and my GitHub handle is where you can find me on Freenode as well.

AlmightyOatmeal avatar Jul 07 '16 16:07 AlmightyOatmeal

What I'm trying to figure out is why I'm getting 683,672 calls to views/base.py->get_all_inclusions() because line 1465 is where my problem is:

to_include = set(chain(map(self.resources_to_include, instances)))

Which happens to be the number of rows in the table:

mysql> select count(*) from tumbleweed_post;
+----------+
| count(*) |
+----------+
|   683672 |
+----------+
1 row in set (0.26 sec)

When there is a default pagination, why is flask-restless querying ALL objects in my table when, by default, it returns only 10 results? What should be happening is querying the first 10 results, then when the next page is queried then both an offset and limit are used.

I'm still rooting through the code to map everything out so I figured I would bring that to @jfinkels attention to see if someone who knows the code layout would have a better idea of why that's going on until I finish figuring out the code layout.

AlmightyOatmeal avatar Jul 08 '16 15:07 AlmightyOatmeal

The performance is atrocious because I had no idea what I was doing when I started this project, and before long people it had become used by way more people than I had ever anticipated :).

:+1: Yes kudos indeed, I've looked at a few flask-api packages and this one looked like the easiest to configure and use. I do agree with @AlmightyOatmeal I think the performance is a high priority. I have a few tables of about 27000 records and it takes around 5-8 seconds for the 10 item paginated result to be retreived to a web browser, meanwhile the server cpu is consumed at 100% processing (single core) :(

I really don't know that much about python to contribute but I am learning so perhaps I will dive in one of these weekends. :)

green3g avatar Jul 08 '16 18:07 green3g

@roemhildtg, welcome to Python :)

If you're interested, you can do what I did with profiling my Flask application take a look at: http://werkzeug.pocoo.org/docs/0.11/contrib/profiler/

There are limitations to that and it's not as detailed as something like line_profiler. I'm working on a module that can hook up to requests in a similar fashion as the aforementioned so that one can use any profiler they want (pprofile looks promising).

Feel free to use my contact information from above if you want to reach me out-of-band (seeing a bug is not the best place for casual chat).

AlmightyOatmeal avatar Jul 08 '16 18:07 AlmightyOatmeal

@roemhildtg, @AlmightyOatmeal

Sorry to derail this issue, but have you seen flask-debugtoolbar? It's quite nice, profile in your browser and you can see all SQL queries.

On a slightly more on-topic note, +1 @jfinkels for starting this project, it's great. I may have some time to help out with this issue.

uSpike avatar Jul 08 '16 22:07 uSpike

@uSpike, I've heard of it but the information on using it is sparse and doesn't seem to apply if I'm not using a jinja template at any endpoint (it's purely a restful API).

AlmightyOatmeal avatar Jul 09 '16 01:07 AlmightyOatmeal

@jfinkels @roemhildtg I created an add-on for werkzeug that uses pprofile to profile requests line-by-line. I'm also making a large number of changes to pprofile as well to better support integrating and mimc cProfile methods for added support of the stats module. I attached the full output (~9MB) of my pprofile add-on and I should have the pull requests for the other module's updates done this weekend.

CONSOLE.PROFILE.OUT.txt

AlmightyOatmeal avatar Jul 09 '16 01:07 AlmightyOatmeal

@jfinkels, I'm thinking the problem could lay base.py/within _get_collection_helper(). I noticed that when I hit this line:

included = self.get_all_inclusions(instances)

That 'instances' is a base query to the model so when self.get_all_inclusions(instances) is called, it's executing:

to_include = set(chain(map(self.resources_to_include, instances)))

(by the way, it's kind of defeating to chain() a map() in a set())

Against a query that is no .limit() applied to it, ergo all items in the database are queried even though there are only supposed to be 10 (by default).

There are a number of times where objects are renamed so it's a little difficult to follow but I think a lot of those methods could/should be broken down to separate functions or methods, if applicable, to help clean-up the code, make it easier to test, and by far easier to read.

Due to the complexity of base.py/within _get_collection_helper(), I'm not sure when I'll hit the nail on the head.

Oh, and the driver has no default limit for the get() function.

AlmightyOatmeal avatar Jul 09 '16 20:07 AlmightyOatmeal

in base.py/get_collection_helper(), I commented out the following lines:

        # Include any requested resources in a compound document.
        # try:
        #     included = self.get_all_inclusions(instances)
        # except MultipleExceptions as e:
        #     # By the way we defined `get_all_inclusions()`, we are
        #     # guaranteed that each of the underlying exceptions is a
        #     # `SerializationException`. Thus we can use
        #     # `errors_from_serialization_exception()`.
        #     return errors_from_serialization_exceptions(e.exceptions, included=True)
        if 'included' not in result:
            result['included'] = []
        # result['included'].extend(included)

I get a response that takes approximately 536ms which is certainly better than 1-5 minutes. My question is: what is the purpose of this and why is it trying to map my entire table?

AlmightyOatmeal avatar Jul 10 '16 01:07 AlmightyOatmeal

My question is: what is the purpose of this and why is it trying to map my entire table?

See http://flask-restless.readthedocs.io/en/latest/includes.html. Basically, if a user requests GET /comments?include=author then the author of each comment will be included in the included element of a compound JSON API document.

jfinkels avatar Jul 13 '16 01:07 jfinkels

@jfinkels but with no includes specified, there should be no reason that the logic is being used. Also, it should not be scanning the entire table especially when there is nothing more to include than what was already in the model. This is where the performance hit is coming from.

I would rather not have that functionality then cause unnecessary delays. I think there is a problem in how the logic is being applied.

AlmightyOatmeal avatar Jul 13 '16 15:07 AlmightyOatmeal

I tested out the code change, and while it may not be the correct solution to the problem, since it removes some of the json api specification and functionality, it does appear that therein lies the issue. My queries on big tables went from 5 seconds to miliseconds when I commented out the section that @AlmightyOatmeal did in the pull-request.

green3g avatar Jul 13 '16 19:07 green3g

I'm having an issue similar to @AlmightyOatmeal's. I'm querying a large table (call it "posts") for a very tiny subset of records based on a many-to-many relationship with another table (call it "tags"). To be exact, I'm using this filter:

{
    name: "tags",
    op: "any",
    val: {
        name: "name",
        op: "eq",
        val: "tag_name"
    }
}

From the database perspective, this restless query results in the following db query to count the results:

SELECT count(*) AS count_1
FROM posts
WHERE EXISTS (SELECT 1
FROM posts_tag_mapping, posts_tags
WHERE posts.id = posts_tag_mapping.post_id AND posts_tags.id = posts_tag_map.tag_id AND posts_tags.name = 'tag_name')

I haven't dug into the code to figure out what is generating this query, but changing the subquery to a join speeds things up significantly.

spight-zz avatar Aug 25 '16 04:08 spight-zz

Hi @spight! Thanks for the report, but that seems to be a separate issue, unrelated to the one being discussed here. Please open a new bug with the info above along with the SQL query you expect to see when filtering on a relation.

As for the original issue, (i.e. scanning the whole database for includes), I believe there should be a relatively straightforward way fix for getting the included resources more efficiently, and I'd like to include a fix for it before releasing version 1.0.0b2. Is it as simple as disabling a check for included resources when none are requested?

jfinkels avatar Sep 09 '16 05:09 jfinkels

Anything going on here? We are hitting the same issue…

Natureshadow avatar Jun 09 '17 13:06 Natureshadow

same here

stephen-bunn avatar Aug 07 '17 16:08 stephen-bunn

We migrated all codebases to Flask-REST-JSONAPI by now. It's worth the effort.

Natureshadow avatar Aug 07 '17 19:08 Natureshadow

@Natureshadow, what about the performance? Can you describe the issue with Flask-Restless and how it's going now with Flask-REST-JSONAPI?

I had a look to Flask-REST-JSONAPI, it seems very nice. Although flexible, I think JSON API introduces a lot of overhead for programming the UI.

averri avatar Dec 26 '17 17:12 averri