scyllapy icon indicating copy to clipboard operation
scyllapy copied to clipboard

Auto cast types

Open caniko opened this issue 2 years ago • 13 comments
trafficstars

Types should be cast automatically this should be possible with prepared statements where the DB responds with the type of each column (does it?). For statements that are not prepared, maybe allow the user to define the data types by enum?

I am getting errors like float not being cast to columns that are double, and int failing to cast to smallint.

caniko avatar Oct 16 '23 06:10 caniko

Errors Writing python float to scylla double:

marshaling error: Expected 8 bytes for a floating type, got 4

Can write Python float to scylla Float, but can't read it:

Cannot parse value of column <column> as Float.

Types that are OK

  • int -> Integer
  • uuid, but have to prepare it with uuid.bytes raw python uuid object does not work.

I am open to do a quick zoom call to debug this together!

caniko avatar Oct 16 '23 07:10 caniko

For such ambiguous types we have special wrappers in scyllapy.extra_types. And python UUIDs should work, since we test it here.

Actually, autocast is possible for prepared statements by using metadata. We will try to implement this.

s3rius avatar Oct 16 '23 10:10 s3rius

Something you could also do is to define custom types towards fields/attributes on Pydantic.BaseModels for type casting. I don't have time for this, but it fits really nicely IMHO.

These models could be used to define tables, and even materialized views.

caniko avatar Oct 16 '23 14:10 caniko

I'm currently working on a small ORM, but don't have much time to complete it.

s3rius avatar Oct 16 '23 14:10 s3rius

I understand, thanks a lot! This is a great project. I think the most important are following features:

  • Have a dtype kwarg in execute and batch, for us to define dtypes with respect to index:
scylla.execute(statement, (5.0, 1000000, 255), (Double, Integer, TinyInt))
  • Auto cast for prepared statements, and give client side errors on fail.

caniko avatar Oct 16 '23 15:10 caniko

The suggested implementation for non-prepared queries seems a little bit too complex for me.

Because currently you can do it like:

from scyllapy.extra_types import Double,  TinyInt

scylla.execute(statement, (Double(5.0), 100000, TinyInt(255))

Also, it allows you to easily use dicts as arguments.

from scyllapy.extra_types import Double,  TinyInt

scylla.execute(statement, {"a": Double(5.0), "b": 100000, "c": TinyInt(255})

s3rius avatar Oct 17 '23 06:10 s3rius

I should've provided the motivation for implementing it for execute. It comes from the need to implement it for scylla.batch:

tuple_with_thousand_tuples = ((1, "hello", 5.0), ...)

my_batch = Batch()
for _ in len(tuple_with_thousand_tuples):
    batch.add_query(prepared)

await scylla.batch(batch, tuple_with_thousand_tuples, dtypes=(SmallInt, None, Double))

None means do nothing.

I think it makes a lot more sense to have this functionality here! It is just natural to also add it to execute. This is especially useful if you only know the types at runtime, meaning the prepared statement might change; it just makes the code overall more readable.

caniko avatar Oct 18 '23 06:10 caniko

I don't like the idea of another parameter in execute. Since you can wrap your values in extra_types. I've made an implemetation that stores information about types of query inside the query. Now if you use prepared statement, then it is going to try to cast arguments to known types. For all other cases, please wrap your values in extra types. It makes it explicit and nice. Also, about batches. There's another type of batches, which can make your code more readable.

import scyllapy
from scyllapy import extra_types

batch = scyllapy.InlineBatch()

for i in range(100):
    batch.add_query("INSERT INTO table (id, val) VALUES (?, ?)", (i, extra_types.Double(1.0)))

scylla.batch(batch)

s3rius avatar Nov 03 '23 10:11 s3rius

Will this work?

import scyllapy
from scyllapy import extra_types

batch = scyllapy.InlineBatch()

prepared = await scylla.prepare("INSERT INTO table (id, val) VALUES (?, ?)")

for i in range(100):
    # NOTE: I am expecting prepared to cast 1.0 to Double
    batch.add_query(prepared, (i, 1.0))

scylla.batch(batch)

caniko avatar Nov 03 '23 11:11 caniko

No. Because in batches we don't have access to queries. I wanted to implement it, but couldn't do it now.

s3rius avatar Nov 03 '23 12:11 s3rius

No. Because in batches we don't have access to queries. I wanted to implement it, but couldn't do it now.

Maybe keep the issue open until you do that? @s3rius

caniko avatar Nov 07 '23 08:11 caniko

Sure. But I'm not sure if it's possible. Anyway. Maybe we will be able to implement it later.

s3rius avatar Nov 09 '23 13:11 s3rius

Errors Writing python float to scylla double:

marshaling error: Expected 8 bytes for a floating type, got 4

Can write Python float to scylla Float, but can't read it:

Cannot parse value of column <column> as Float.

Types that are OK

* `int` -> `Integer`

* uuid, but have to prepare it with `uuid.bytes` raw python `uuid` object does not work.

I am open to do a quick zoom call to debug this together!

One of the issues is that we "can" push Float data into the Scylla Table, but as mentioned, we cannot retrieve it ;

Do we absolutely need to convert all floats to doubles to make this work?

Edit: the issue arises from a typo:

https://github.com/Intreecom/scyllapy/blob/d8d01c1866985121073a83e44d8fb305baf6d42d/src/utils.rs#L371C1-L374C56

the cast should be .as_float()

Best regards,

diviyank avatar Jun 27 '24 09:06 diviyank