FOSElasticaBundle icon indicating copy to clipboard operation
FOSElasticaBundle copied to clipboard

Postgres: Populate command "skips" entities

Open backbone87 opened this issue 7 years ago • 6 comments

The query used by the pager of the populate command does not specify an ORDER BY clause. this causes "randomly" filled pages. See https://www.postgresql.org/docs/9.4/static/queries-limit.html

Providing an ORDER BY can currently be done only by overwriting the provider or overwriting EntityRepository's createQueryBuilder method in a custom repository. Both is pretty "heavy" just to apply an ORDER BY.

backbone87 avatar Feb 06 '18 13:02 backbone87

Isn't it something that should be fixed on pagerfanta side?

makasim avatar Feb 06 '18 13:02 makasim

pagerfanta just consumes whatever provider you give to him, no?

backbone87 avatar Feb 06 '18 14:02 backbone87

This issue happens with Doctrine too. As @backbone87 suggested the below solution fixed the issue for anyone looking for a quick solution. I was populating the index and being 100,000 documents short on 500,000. No errors or logs to tell me what was happening. Frustrating as hell, so hopefully this helps someone until I/someone gets time to fix.

namespace AppBundle\YourBundle\Entity;

use Doctrine\ORM\EntityRepository;

class YourRepository extends EntityRepository
{
    public function createQueryBuilder($alias, $indexBy = null)
    {
        return $this->_em->createQueryBuilder()
            ->select($alias)
            ->from($this->_entityName, $alias, $indexBy)
            ->orderBy($alias . '.id');
    }
}

jmwill86 avatar Jun 11 '18 08:06 jmwill86

@jmwill86 @backbone87 did you have time to fix it ?

jamalo avatar Jul 23 '18 13:07 jamalo

If you don't want to override your entity repository's default query builder you can implement a separate method and then configure the bundle to use it, e.g.

persistence:
    driver: orm
    model: AppBundle\YourBundle\Entity
    listener: ~
    provider:
        query_builder_method: createSortedQueryBuilder

It's hard to say where responsibility lies to fix this issue because:

  1. The bundle could enforce a default sort order since it already knows the identity field to sync the ES document to, but...
  2. Pagerfanta (and KNP Paginator) need results from the DB to be ordered consistently in order to paginate "correctly", yet accept unordered queries, which affects more than just this project. However...
  3. We're ultimately in control of the queries used, and we should remember that there are no guarantees when it comes to the default ordering of results returned by an RDBMS.

There's definitely room for improvement regarding the developer experience because this will have caught a bunch of people out (me included), but apart from more documentation I'm not sure what a correct solution would be to prevent this from happening 🤔

lushc avatar Oct 16 '18 14:10 lushc

Note that the data is not skipped, but rather incorrectly duplicated with updated response from ES,at least in my case. Adding that in case people search primarily by "duplicated" and similar keywords like I did earlier and wasted a lot of time on manual debugging.

Destroy666x avatar Oct 31 '18 14:10 Destroy666x