scyllapy
scyllapy copied to clipboard
Auto cast types
Types should be cast automatically this should be possible with prepared statements where the DB responds with the type of each column (does it?). For statements that are not prepared, maybe allow the user to define the data types by enum?
I am getting errors like float not being cast to columns that are double, and int failing to cast to smallint.
Errors
Writing python float to scylla double:
marshaling error: Expected 8 bytes for a floating type, got 4
Can write Python float to scylla Float, but can't read it:
Cannot parse value of column <column> as Float.
Types that are OK
int->Integer- uuid, but have to prepare it with
uuid.bytesraw pythonuuidobject does not work.
I am open to do a quick zoom call to debug this together!
For such ambiguous types we have special wrappers in scyllapy.extra_types. And python UUIDs should work, since we test it here.
Actually, autocast is possible for prepared statements by using metadata. We will try to implement this.
Something you could also do is to define custom types towards fields/attributes on Pydantic.BaseModels for type casting. I don't have time for this, but it fits really nicely IMHO.
These models could be used to define tables, and even materialized views.
I'm currently working on a small ORM, but don't have much time to complete it.
I understand, thanks a lot! This is a great project. I think the most important are following features:
- Have a dtype kwarg in execute and batch, for us to define dtypes with respect to index:
scylla.execute(statement, (5.0, 1000000, 255), (Double, Integer, TinyInt))
- Auto cast for prepared statements, and give client side errors on fail.
The suggested implementation for non-prepared queries seems a little bit too complex for me.
Because currently you can do it like:
from scyllapy.extra_types import Double, TinyInt
scylla.execute(statement, (Double(5.0), 100000, TinyInt(255))
Also, it allows you to easily use dicts as arguments.
from scyllapy.extra_types import Double, TinyInt
scylla.execute(statement, {"a": Double(5.0), "b": 100000, "c": TinyInt(255})
I should've provided the motivation for implementing it for execute. It comes from the need to implement it for scylla.batch:
tuple_with_thousand_tuples = ((1, "hello", 5.0), ...)
my_batch = Batch()
for _ in len(tuple_with_thousand_tuples):
batch.add_query(prepared)
await scylla.batch(batch, tuple_with_thousand_tuples, dtypes=(SmallInt, None, Double))
None means do nothing.
I think it makes a lot more sense to have this functionality here! It is just natural to also add it to execute. This is especially useful if you only know the types at runtime, meaning the prepared statement might change; it just makes the code overall more readable.
I don't like the idea of another parameter in execute. Since you can wrap your values in extra_types. I've made an implemetation that stores information about types of query inside the query. Now if you use prepared statement, then it is going to try to cast arguments to known types. For all other cases, please wrap your values in extra types. It makes it explicit and nice. Also, about batches. There's another type of batches, which can make your code more readable.
import scyllapy
from scyllapy import extra_types
batch = scyllapy.InlineBatch()
for i in range(100):
batch.add_query("INSERT INTO table (id, val) VALUES (?, ?)", (i, extra_types.Double(1.0)))
scylla.batch(batch)
Will this work?
import scyllapy
from scyllapy import extra_types
batch = scyllapy.InlineBatch()
prepared = await scylla.prepare("INSERT INTO table (id, val) VALUES (?, ?)")
for i in range(100):
# NOTE: I am expecting prepared to cast 1.0 to Double
batch.add_query(prepared, (i, 1.0))
scylla.batch(batch)
No. Because in batches we don't have access to queries. I wanted to implement it, but couldn't do it now.
No. Because in batches we don't have access to queries. I wanted to implement it, but couldn't do it now.
Maybe keep the issue open until you do that? @s3rius
Sure. But I'm not sure if it's possible. Anyway. Maybe we will be able to implement it later.
Errors Writing python
floatto scylladouble:marshaling error: Expected 8 bytes for a floating type, got 4Can write Python
floatto scyllaFloat, but can't read it:Cannot parse value of column <column> as Float.Types that are OK
* `int` -> `Integer` * uuid, but have to prepare it with `uuid.bytes` raw python `uuid` object does not work.I am open to do a quick zoom call to debug this together!
One of the issues is that we "can" push Float data into the Scylla Table, but as mentioned, we cannot retrieve it ;
Do we absolutely need to convert all floats to doubles to make this work?
Edit: the issue arises from a typo:
https://github.com/Intreecom/scyllapy/blob/d8d01c1866985121073a83e44d8fb305baf6d42d/src/utils.rs#L371C1-L374C56
the cast should be .as_float()
Best regards,