pg icon indicating copy to clipboard operation
pg copied to clipboard

Is there a way to optimize loading large IN queries with has_many?

Open sonhnguyen opened this issue 4 years ago • 2 comments

I am trying to load nested has_many from products table but it seems with large list of IDs the query generated huge IN list is not ideal.

Can I do a manual JOIN and somehow map it to my model? I checked the generated queries and only see bunch of select IN from variants, options, attributes, values... Does go-pg offering auto joining from those associations?

type Product struct {
	Attributes                []Attribute    `json:"attributes"`
	Variants                  []Variant      `json:"variants"`
}
type Variant struct {
	Options     []Option       `json:"options"`
	ProductID   uint           `gorm:"index"`
}
type Attribute struct {
	AttributeValues []AttributeValue `json:"values"`
	ProductID       uint             `gorm:"index"`
}
type Option struct {
	VariantID           uint      `gorm:"index"`
	AttributeExternalID string    `json:"attribute" gorm:"index"`
	Attribute           Attribute
}
type AttributeValue struct {
	AttributeID uint           `gorm:"index"`
}

The code:

			err := ecrawler.DB2.Model(&products).
				Column("product.*").
				Relation("Variants").
				Relation("Variants.Options.*").
				Relation("Attributes").
				Relation("Attributes.AttributeValues").
				Where("product.id IN (?)", gopg.In(chunk)).
				Select()

generated query of the above:

SELECT "product".* FROM "products" AS "product" WHERE ((product.id = ANY('{1}'::int[]))) AND "product"."deleted_at" IS NULL

SELECT "variant"."id", "variant"."created_at", "variant"."updated_at", "variant"."deleted_at", "variant"."external_id", "variant"."currency", "variant"."is_default", "variant"."is_taxable", "variant"."product", "variant"."retail_price", "variant"."sides", "variant"."title", "variant"."weight", "variant"."images", "variant"."product_id" FROM "variants" AS "variant" WHERE (("variant"."product_id" IN (1))) AND "variant"."deleted_at" IS NULL

SELECT "option".* FROM "options" AS "option" WHERE (("option"."variant_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16))) AND "option"."deleted_at" IS NULL

SELECT "attribute"."id", "attribute"."created_at", "attribute"."updated_at", "attribute"."deleted_at", "attribute"."external_id", "attribute"."is_preselected", "attribute"."name", "attribute"."position", "attribute"."slug", "attribute"."value_type", "attribute"."product_id" FROM "attributes" AS "attribute" WHERE (("attribute"."product_id" IN (1))) AND "attribute"."deleted_at" IS NULL


SELECT "attribute_value"."id", "attribute_value"."created_at", "attribute_value"."updated_at", "attribute_value"."deleted_at", "attribute_value"."image", "attribute_value"."images", "attribute_value"."key", "attribute_value"."name", "attribute_value"."position", "attribute_value"."slug", "attribute_value"."value", "attribute_value"."attribute_id" FROM "attribute_values" AS "attribute_value" WHERE (("attribute_value"."attribute_id" IN (1))) AND "attribute_value"."deleted_at" IS NULL <nil>

sonhnguyen avatar Mar 08 '20 22:03 sonhnguyen

Your best bet with go-pg is to try to modify this code - https://github.com/go-pg/pg/blob/v9.1.3/orm/join.go#L46-L87 . TBH I don't see how to improve / make it faster.

vmihailenco avatar Mar 10 '20 12:03 vmihailenco

Your best bet with go-pg is to try to modify this code - https://github.com/go-pg/pg/blob/v9.1.3/orm/join.go#L46-L87 . TBH I don't see how to improve / make it faster.

Will I be able to make my own Joins query and somehow go-pg can map the fields to the corresponding model?

			err := ecrawler.DB2.Model(&products).
				Column("product.*").
				Joins("variants on ....").
				Joins("options on ...").
				Joins("attributes on ...").
				Joins("attribute_values on ....").
				Where("product.id IN (?)", gopg.In(chunk)).
				Select()

I tried but my products having empty associations (products[0].Variants = nil, products[0].Attributes = nil)

sonhnguyenn avatar Mar 11 '20 09:03 sonhnguyenn