proletarian icon indicating copy to clipboard operation
proletarian copied to clipboard

Cannot use JSON or JSONB type for the payload field

Open raszi opened this issue 1 year ago • 6 comments

We need to write the proletarian.job table outside of Clojure, therefore I switched the serializer from Transit to JSON (Jsonista).

Everything worked to the point where I switched the payload field from text to jsonb because the proletarian.db namespace sets the type explicitly as a string.

https://github.com/msolli/proletarian/blob/2ee188f3ffb1d96cbd5776ae184dce79d5d73b26/src/proletarian/db.clj#L33

I guess a .setObject call would be necessary, but I am unfamiliar with the java.sql.Connection class, so this is a wide guess.

Currently, there is no way to override this behavior, unfortunately.

Are you planning to support such usage in the future?

Thank you for the great open-source library!

raszi avatar Jun 05 '24 17:06 raszi

This is related to this thread on the clojurians Slack:

https://clojurians.slack.com/archives/C01UG9GMVBJ/p1680284891783359

raszi avatar Jun 05 '24 17:06 raszi

I did a bit more investigation on the topic, it seems that something like this would be necessary:

(import '(org.postgresql.util PGobject))

(defn ->pgobject
  "Transforms Clojure data to a PGobject that contains the data as
  JSON. PGObject type defaults to `jsonb` but can be changed via
  metadata key `:pgtype`"
  [x]
  (let [pgtype (or (:pgtype (meta x)) "jsonb")]
    (doto (PGobject.)
      (.setType pgtype)
      (.setValue (->json x)))))

Then .setObject could be used, which means that extending the serializer to always return an object might work.

raszi avatar Jun 07 '24 08:06 raszi

Have a look at the fork from Deepico, they added it: https://github.com/deepico/proletarian

mkreis avatar Jun 14 '24 07:06 mkreis

The existing text type on the payload can store any value serialized to string. What is the use case for storing the payload as JSONB?

msolli avatar Oct 13 '24 20:10 msolli

We need to schedule the jobs outside of the Clojure application (from triggers, e.g.).

PostgreSQL has functions to create JSONB structures conveniently, but you can only construct EDN structures technically by string functions, and that is error-prone.

raszi avatar Oct 18 '24 10:10 raszi

Of course, we can build a JSONB structure and convert that to a string and then read back that as a string and treat it like a JSONB but that does not sound too safe either.

raszi avatar Oct 18 '24 10:10 raszi

But you don't need a JSONB Postgres field to write and read JSON. JSON is just text. Just encode the job data as JSON and write the serialized value (a string) to the payload field. You can do this in any programming language.

From the manual (emphasis mine):

JSON data types are for storing JSON (JavaScript Object Notation) data, as specified in RFC 7159. Such data can also be stored as text, but the JSON data types have the advantage of enforcing that each stored value is valid according to the JSON rules. There are also assorted JSON-specific functions and operators available for data stored in these data types; see Section 9.16.

Proletarian doesn't need access to any of the data that is encoded in the payload field, it just needs the value and how to deserialize it. There is no need to incur the cost (however small) of storing the value as specialized JSONB type, not to mention that this would not work with any other serialization format.

msolli avatar Oct 21 '24 07:10 msolli

My problem with the text field is that it can be corrupt and does not describe what is inside, therefore any third-party tools like Metabase will treat it like text and not JSONB data.

Of course, I could add a check constraint to ensure that the payload field can be decoded as a JSON object, but that only solves the corrupt part it does not solve the third-party tool problem.

We could store all the fields as text and we could have secondary information stored in the code on how to treat these text fields but there is a good reason why the fields have a type in the databases IMO.

raszi avatar Oct 23 '24 17:10 raszi

My problem with the text field is that it can be corrupt […] Of course, I could add a check constraint to ensure that the payload field can be decoded as a JSON object […]

It can only be corrupt if you write corrupt data into it. If you JSON encode the payload using your language's JSON encoding facility, you must trust that it is not corrupt. That would be a bug in the JSON encoder. If you have doubts you can check, as you said, but that seems overly defensive to me.

any third-party tools like Metabase will treat it like text and not JSONB data.

As they should - it is serialized data, a "black box", not to be queried or inspected in a normal circumstances.

I've asked before for a use case that can only be supported by changing the DB field type to JSONB. We can discuss approaches if there is a real use case for it.

there is a good reason why the fields have a type in the databases IMO.

It has the correct type: text. The data has been serialized to text (JSON, or EDN, or Base64, or whichever serialization function one chooses), and must be stored as text.

msolli avatar Oct 23 '24 19:10 msolli

It can only be corrupt if you write corrupt data into it. If you JSON encode the payload using your language's JSON encoding facility, you must trust that it is not corrupt. That would be a bug in the JSON encoder. If you have doubts you can check, as you said, but that seems overly defensive to me.

No, not really. Somebody else could also modify that, for example using an external tool, and if I don't have any constraints then I would only know that a payload is corrupt when the code will try to decode that. Read instead of rejecting an update or insert with incorrect data it would simply let it through.

As they should - it is serialized data, a "black box", not to be queried or inspected in a normal circumstances.

I guess our use-case also differs here. We need to check the queried jobs and we need to check for the payload as well to know if something is scheduled for a certain user e.g.

I've asked before for a use case that can only be supported by changing the DB field type to JSONB. We can discuss approaches if there is a real use case for it.

Yes, and I answered that. We need to create jobs outside of the Clojure application, and I wrote triggers as an example.

I started this ticket with a question:

Are you planning to support such usage in the future?

If you don't that is also fine. We can live with this, I made my workaround.

raszi avatar Oct 23 '24 19:10 raszi