django-mysql icon indicating copy to clipboard operation
django-mysql copied to clipboard

GroupConcat deserialize to lists or sets

Open adamchainz opened this issue 9 years ago • 12 comments

It's kinda annoying that GroupConcat doesn't automatically split back into lists (or maybe sets when distinct=True) on the python side - 90% of the use cases require the user to do this manually with str.split on the result before being able to use this code.

I've looked into using the list and set fields from #10 and #11 but it's a bit complicated - they don't support the custom separators, the datatypes aren't so neat to convert, it requires a bit of trickery. It might still be possible, otherwise we might have to just write custom code or some dynamic field type to achieve this - so that it ties in with the rest of the ORM.

adamchainz avatar Mar 27 '15 14:03 adamchainz

This would be a really nice feature. currently i am working around this by doing

Subquery(Coalesce(qs.annotate(annotation=GroupConcat('related_pk')).values('annotation'), Value(''), output_field=ListCharField(models.CharField())))

but this is buggy, when you want to do __contains on the subquery. With the Coalesce above it searches with the Value('') in the subquery expression and if you remove the Coalesce and replace it with an ExpressionWrapper to do the conversion to ListCharField, then you end up with

  File "/var/env/lib/python3.6/site-packages/django_mysql/models/lookups.py", line 175, in as_sql
    params = lhs_params + rhs_params
TypeError: can only concatenate tuple (not "list") to tuple

edit: Using django 2.2

edit 2: currently i am monkey patching SetContains.as_sql to feature

    params = list(lhs_params) + list(rhs_params)

tuky avatar May 20 '19 13:05 tuky

Re: edit 2, that seems like it shouldn't be necessary as both vars should be lists. Can you give the full traceback? Where did the tuple come from? Transforms inside Django itself expect to receive lists for both params variables, e.g. https://github.com/django/django/blob/master/django/contrib/postgres/lookups.py#L12

adamchainz avatar May 27 '19 21:05 adamchainz

Here is the full traceback:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
    City.objects.annotate_region_slugs().filter(region_slugs__contains='foo-slug')
  File "/var/env/lib/python3.6/site-packages/django/db/models/query.py", line 250, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/var/env/lib/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
    self._fetch_all()
  File "/var/env/lib/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/env/lib/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/env/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1087, in execute_sql
    sql, params = self.as_sql()
  File "/var/env/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 489, in as_sql
    where, w_params = self.compile(self.where) if self.where is not None else ("", [])
  File "/var/env/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 405, in compile
    sql, params = node.as_sql(self, self.connection)
  File "/var/env/lib/python3.6/site-packages/django/db/models/sql/where.py", line 81, in as_sql
    sql, params = compiler.compile(child)
  File "/var/env/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 405, in compile
    sql, params = node.as_sql(self, self.connection)
  File "/code/src/core/fields.py", line 65, in as_sql
    params = lhs_params + rhs_params
TypeError: can only concatenate tuple (not "list") to tuple

i monkey patched the method like so to print some outputs:

def as_sql(self, qn, connection):
    lhs, lhs_params = self.process_lhs(qn, connection)
    rhs, rhs_params = self.process_rhs(qn, connection)

    print(repr(lhs))
    print(repr(lhs_params))
    print(repr(rhs))
    print(repr(rhs_params))

    params = lhs_params + rhs_params
    # Put rhs on the left since that's the order FIND_IN_SET uses
    return 'FIND_IN_SET(%s, %s)' % (rhs, lhs), params

it prints this:

'(SELECT GROUP_CONCAT(U0.`slug`) AS `slugs` FROM `core_region` U0 WHERE ST_Contains(U0.`polygon`, (`core_city`.`point`)))'
()
'%s'
['foo-slug']

tuky avatar May 28 '19 08:05 tuky

Huh. It looks like this might be a bug in django itself, the subquery on the left hand side is what's returning its params in a tuple. And indeed it seems that the SQLCompiler class forces tuples: https://github.com/django/django/blob/master/django/db/models/sql/compiler.py#L614 . Can you raise this on https://code.djangoproject.com/ , CC me, and even look into a patch? 😇 https://docs.djangoproject.com/en/dev/internals/contributing/

adamchainz avatar May 28 '19 19:05 adamchainz

Tuples seem the way to go, so this might require fixing in here, i guess.

tuky avatar May 29 '19 14:05 tuky

Version 3.2.0 treats everything as tuples, try that.

adamchainz avatar Jun 14 '19 15:06 adamchainz

Thank you very much, don't have to monkey patch it anymore :-)

tuky avatar Jun 14 '19 16:06 tuky

Not completelly sure whehther is related to this particular issue, but with 1.11.17/3.2.0 for django/django-mysql, I've need to patch following https://jira.mongodb.org/browse/PYMODM-77.
With that change, I can successfully run

[...].annotate(members=GroupConcat('user_id', output_field=SetCharField(CharField))).[...]

I also needed to raise value of group_concat_max_len mysql variable, but that's just because my output lists are very long and got truncated in the mysql side

I'm still in the process to review this, because I get a discrepancy in the number of entries reported by count() and group_concat() with the same group_by/annotation

EDIT: Forget about the mismatch. It just happened that my sample record seemed to be completed with the extended length (not clearly ending with , as with original sizes). Setting max_len to x100 (instead of x10 as originally), produced the right result.

javiplx avatar Apr 26 '20 09:04 javiplx

@javiplx I am not sure how that MongoDB ORM issue aligns with here. Can yuo make a PR for your patch, including a test that fails before and passes after?

adamchainz avatar Apr 26 '20 10:04 adamchainz

I wasn't even aware that it was mongodb for django. Just finding matches for the exception I got. I'll fork and submit the changes.

javiplx avatar Apr 26 '20 10:04 javiplx

The PR is there (adamchainz/django-mysql#646). It cannot be merged because i started from tag v3.2.0 but you can see changes there.

And it is even more funnier.
With the original 3.2.0 from PyPI, I got

TypeError: to_python() missing 1 required positional argument: 'value'

which is the text what lead me to the "solution" (which is the first commit on the PR), that enabled me to give explicit output_field argument. Then, my laziness suggested me to supply default fields on code (as already suggested there), and I found that when output_field is not explicitly given I get a similar exception , but complaining about too many positional arguments (our python is 3.7.4 in case that matters).

So, I'm somewhat happy because second commit is what I actually wanted, but also a bit disliked. Anyway, I'll make shortly a second PR with that second commit

javiplx avatar Apr 26 '20 13:04 javiplx

I wasn't even aware that it was mongodb for django. It's not even for Django. They just happened to copy the Django ORM for their MongoDB ORM.

adamchainz avatar Apr 26 '20 15:04 adamchainz