bullmq icon indicating copy to clipboard operation
bullmq copied to clipboard

feat(src/classes/job.ts): add custom serializer/deserializer for job data

Open godinja opened this issue 1 year ago • 9 comments

ref #14

godinja avatar May 31 '24 15:05 godinja

Thank you for the PR. The issue with serializers is that if we allow a completely custom serializer, then all the dashboard tools will break, as they will not be able to show any of the job data. Therefore the functionality must be implemented in such a way that we offer a set of predefined serializers, for example json, protobuf, messagepack, etc, so that when getting a job we also can apply the correct deserializer.

manast avatar Jun 03 '24 08:06 manast

The issue with serializers is that if we allow a completely custom serializer, then all the dashboard tools will break, as they will not be able to show any of the job data.

The dashboards are a third party offering and aren't necessarily required to use BullMQ, right? Couldn't we just inform the user that using custom serializers and deserializers may break third party integrations? Considering that this is an opt-in feature, I feel like it's an appropriate trade off.

Therefore the functionality must be implemented in such a way that we offer a set of predefined serializers, for example json, protobuf, messagepack, etc, so that when getting a job we also can apply the correct deserializer.

What about restricting the type definitions a bit more:

type JsonObject = { [key: string]: JsonValue };

type JsonValue =
  | null
  | boolean
  | number
  | string
  | JsonValue[]
  | JsonObject;

type SerializerFn = (data: any) => JsonValue; // stringify return value with JSON.stringify in Job.asJSON()
type DeserializeFn = (data: string) => JsonValue;

That way the data is guaranteed to be parsable by JSON.parse. In our use case we plan on using superjson which serializes the data into a JSON-compatible structure.

godinja avatar Jun 03 '24 13:06 godinja

The dashboards are a third party offering and aren't necessarily required to use BullMQ, right? Couldn't we just inform the user that using custom serializers and deserializers may break third party integrations? Considering that this is an opt-in feature, I feel like it's an appropriate trade off.

Yes, they are third-party but a very important piece in BullMQ's ecosystem, as they are super helpful for debugging and managing production systems.

That way the data is guaranteed to be parsable by JSON.parse. In our use case, we plan on using superjson which serializes the data into a JSON-compatible structure.

I feel that would be still too specific as that would limit the possibility of having other possibly more efficient serialization methods in the future. I still think the best way is to define a set of fixed serialization/deserialization methods, but I haven't figured out yet what would be the best way to do this so that we do not need to add a bunch of extra dependencies to BullMQ, but instead optional add-ons that can be added only if required.

manast avatar Jun 03 '24 13:06 manast

That way the data is guaranteed to be parsable by JSON.parse. In our use case, we plan on using superjson which serializes the data into a JSON-compatible structure.

I feel that would be still too specific as that would limit the possibility of having other possibly more efficient serialization methods in the future. I still think the best way is to define a set of fixed serialization/deserialization methods, but I haven't figured out yet what would be the best way to do this so that we do not need to add a bunch of extra dependencies to BullMQ, but instead optional add-ons that can be added only if required.

Ah I see, the long term vision for this is to be extensible for other data formats, not just JSON.

Considering that https://github.com/taskforcesh/bullmq/issues/14 has been open for almost 5 years, how would you feel about releasing something like this PR to allow for custom JSON serializers?

godinja avatar Jun 03 '24 13:06 godinja

Considering that #14 has been open for almost 5 years, how would you feel about releasing something like this PR to allow for custom JSON serializers?

I created that issue because I think it is a useful feature, however, I do not think it has gained so much interest over the years, otherwise it would have been up-prioritized. But this PR is enabling something that potentially can break current UIs, so I am not so positive to merge it as is. If we can figure out a better way, then yes I am open to it.

manast avatar Jun 03 '24 13:06 manast

But this PR is enabling something that potentially can break current UIs, so I am not so positive to merge it as is. If we can figure out a better way, then yes I am open to it.

I've made some updates and included a pattern doc (with a note on how this may effect third-party integrations). With the new updates, I don't really see how this could break UIs.

godinja avatar Jun 03 '24 15:06 godinja

@godinja Hi, Thanks a lot for your contributes.

In my opinion, I think that do twice serializations of serializer(rawData) and JSON.stringify is not good idea for node.js event loop. Can we just design once serialization/deserialization for data?

maybe like

{
   data: serializer(rawData),
   ...
}

HsinHeng avatar Oct 12 '24 14:10 HsinHeng

I'd love to see something like this.

ryanolf avatar Jan 27 '25 11:01 ryanolf

@manast I think a key aspect here isn't even so much about (more or less) complex serialization/deserialization rules, but even some of the fundamental ones in the JS core and how this plays together with BullMQ's typing.

Consider a queue that takes data inputs with a date like { start: Date }. When I then add a job to the queue, doing something like queue.add({ start: new Date() }) is completely fine, because JSON.stringify does sensible serialization for Date objects. However, for deserialization that is not the case: JSON.parse('{ "start": "2025-04-08T09:15:00Z" }') doesn't give you a JS object with start as a Date, but rather just a plain string that you then have to convert into a proper Date again. So arguably BullMQ even puts a false sense into users, because the DataType that you add to the queue (a Date) can't even be the same that BullMQ then uses to invoke the job.

The only way to make this work reliably and with proper types (at least that I can think of) is to 1) generally type the input as Date | string (or a custom string-ish type to ensure the ISO format of the string) and then 2) do a runtime check (think: const start = typeof data.start === 'string' ? parseDateSomehow(data.start) : data.start) or use a library like Moment.js that transparently works with both. That is certainly doable, but also cumbersome.

If the main/only issue are dashboards, which presumably relies on Job.fromJSON, then why not do this: Allow a custom deserializer (as proposed by the author of the PR) that is used within Worker.createJob. This serializer should default to just no-op (i.e. use whatever it gets from Job.fromJSON), but allow for an extension so that use cases with types that don't deserialize properly can be handled correctly. Think somewhat like this:

protected createJob(
  data: JobJsonRaw,
  jobId: string,
): Job<DataType, ResultType, NameType> {
  const job = this.Job.fromJSON(this as MinimalQueue, data, jobId)

  if (this.Job.deserializer) {
    job.data = this.Job.deserializer(job.data)
  }
  
  return job as Job<
    DataType,
    ResultType,
    NameType
  >;
}

This could serve as a quite natural extension point without interfering with plain deserialization (for where it's needed).

clemens avatar Apr 08 '25 09:04 clemens