django-pandas icon indicating copy to clipboard operation
django-pandas copied to clipboard

read_frame bug: ForeignKey lookup fails if any Null values present

Open odoublewen opened this issue 4 years ago • 2 comments

ForeignKey lookup works as long as no null values are present. But I also have some models where null values are allowable, for example:

class Foo(models.Model):
    name = models.CharField(max_length=64, unique=True)
    description = models.CharField(max_length=128, unique=True)

    def __str__(self):
        return self.name


class Sample(models.Model):
    foo = models.ForeignKey(Foo, on_delete=models.PROTECT, null=True, blank=True)

read_frame returns expected results as long as all of the qs objects are non null for field foo:

In [25]: read_frame(Sample.objects.filter(Q(id=637)), fieldnames=['id', 'foo'])
Out[25]: 
    id          foo
0  637           XY

...but if one null is present, all rows in the df become Null.

In [26]: read_frame(Sample.objects.filter(Q(id=637)|Q(id=241)), fieldnames=['id', 'foo'])
Out[26]: 
    id          foo
0  241         None
1  637         None

Somewhat similar to #93 ?

odoublewen avatar Dec 29 '20 18:12 odoublewen

@odoublewen Please let me know if the following work around seems useful to you.

MyModel (parent) in models.py

class MyModel(models.Model):
    MyModelId = models.BigAutoField(
        _('Id'),
        primary_key=True
    )
    Name = models.CharField(
        _('Name'),
        max_length=225,
        null=True,
        blank=True
    )
    Date = models.DateField(
        _('Date'),
        null=True,
        blank=True
    )
    DateTime = models.DateTimeField(
        _('DateTime'),
        null=True,
        blank=True
    )
    Integer = models.IntegerField(
        _('Integer'),
        null=True,
        blank=True
    )
    Float = models.FloatField(
        _('Float'),
        null=True,
        blank=True,
    )

    def __str__(self):
        return f'{self.pk} - {self.Name}'

    class Meta:
        db_table = 'MyModel'
        verbose_name = 'My Model Object'
        verbose_name_plural = 'My Model Objects'

    @classmethod
    def get_dataframe(cls, instance=None):
        if not instance:
            qs = cls.objects.all()
            return read_frame(qs, ('MyModelId',
                                   'MyForeignKeyModels__Name',
                                   'MyForeignKeyModels__pk',
                                   'Name',
                                   'Date',
                                   'DateTime',
                                   'Integer',
                                   'Float',))

MyForeignKeyModel is defined as

class MyForeignKeyModel(models.Model):
    MyForeignKeyModelId = models.BigAutoField(
        _('Id'),
        primary_key=True
    )
    MyModel = models.ForeignKey(
        'MyModel',
        on_delete=models.CASCADE,
        related_name='MyForeignKeyModels',
        null=True,
        blank=True
    )
    Name = models.CharField(
        _('Name'),
        max_length=225,
        null=True,
        blank=True
    )
    Date = models.DateField(
        _('Date'),
        null=True,
        blank=True
    )
    DateTime = models.DateTimeField(
        _('DateTime'),
        null=True,
        blank=True
    )
    Integer = models.IntegerField(
        _('Integer'),
        null=True,
        blank=True
    )
    Float = models.FloatField(
        _('Float'),
        null=True,
        blank=True,
    )

    def __str__(self):
        return f'{self.pk} - {self.Name}'

    @classmethod
    def get_dataframe(cls, instance=None, *args, **kwargs):
        if not instance:
            qs = cls.objects.all()
            fields_list = []
            for field in cls._meta.fields:
                import ipdb
                ipdb.set_trace()
                fields_list.append(field.name)


            return read_frame(qs, ('MyForeignKeyModelId',
                                   'Name',
                                   'Date',
                                   'DateTime',
                                   'Integer',
                                   'Float',
                                   'MyModel__Name'))

    class Meta:
        db_table = 'MyForeignKeyModel'
        verbose_name = 'My Foreign Key Model'
        verbose_name_plural = 'My Foreign Key Models'

 

In [1]: from MyApp1.models import *

In [2]: df = MyModel.get_dataframe() MyApp1.MyModel.MyModelId <ManyToOneRel: MyApp1.myforeignkeymodel> MyApp1.MyForeignKeyModel.Name <ManyToOneRel: MyApp1.myforeignkeymodel>

FieldDoesNotExist => MyForeignKeyModel has no field named 'pk' 'Options' object has no attribute 'get_all_related_objects_with_model' :::: DEPRECATED FROM DJANGO 1.1 Django Docs for Deprecation


The result I get on calling the classmethod of MyModel is :

   MyModelId MyForeignKeyModels__Name  MyForeignKeyModels__pk       Name        Date                  DateTime  Integer    Float
0          1                 huyjgjhm                       1  masdfasdf  2023-04-28 2023-04-28 07:14:16+00:00        4    3.122
1          1                 bvnbvnvb                       2  masdfasdf  2023-04-28 2023-04-28 07:14:16+00:00        4    3.122
2          2                   321321                       5    ppolkjm  2023-04-28                       NaT     3423  123.000
3          2                   ij9045                       6    ppolkjm  2023-04-28                       NaT     3423  123.000
 


and the result on calling the classmethod of MyForeignKeyModel is :

In [3]: df
Out[3]: 
   MyForeignKeyModelId      Name        Date                  DateTime  Integer     Float MyModel__Name
0                    1  huyjgjhm  2023-04-28 2023-04-28 07:14:28+00:00      NaN       NaN     masdfasdf
1                    2  bvnbvnvb  2023-04-28                       NaT     45.0    34.000     masdfasdf
2                    3  sdfgdsfg  2023-04-28 2023-04-28 07:15:10+00:00    333.0  2342.000          None
3                    4  gfhju767  2023-04-28 2023-04-28 07:15:31+00:00  12323.0       NaN          None
4                    5    321321  2023-04-28 2023-04-28 07:16:06+00:00      NaN   112.000       ppolkjm
5                    6    ij9045  2023-04-28                       NaT   7744.0     1.025       ppolkjm
 

NOTE: There were 2 objects of MyModel and 6 objects of MyForeignKetModel
2 of them (id 3 and 4) had null value for the Foreign Key

TheAbhilash23 avatar Apr 28 '23 09:04 TheAbhilash23

Good afternoon Also faced with such a problem If, when sampling from a model with a ForeignKey column, one record has a value, and the second one does not, then both values have None And if both entries with ForeignKey were filled in, then everything will be fine.

I did a little research and realized that in django_pandas/utils.py

  def replace_pk(model):
      base_cache_key = get_base_cache_key(model)
  
      def get_cache_key_from_pk(pk):
          return None if pk is None else base_cache_key % str(pk)

and if one of the ForeignKey records arrives empty, then in the get_cache_key_from_pk function, the pk parameter comes as float for all filled records and NoneType for non-filled ones, and if all records have a ForeignKey other than Null, then pk comes as int. Well, from here, after executing the get_cache_key_from_pk function for pk with the float type, it adds '.0' to the identifier, which is then not found.

if you change instead to int(pk), then everything works

  def replace_pk(model):
      base_cache_key = get_base_cache_key(model)
  
      def get_cache_key_from_pk(pk):
          return None if pk is None else base_cache_key % int(pk)

Dear developers, please check this and change it in new versions if this is the right solution

avtrosty avatar Aug 29 '23 04:08 avtrosty