spring-batch
spring-batch copied to clipboard
Add support for MongoDB as JobRepository
Ramin Zare opened BATCH-2727 and commented
adding Support for storing Job instances in Mongo instead of JDBC
Issue Links:
- BATCH-2836 Add Support For MongoDB or Api To Be Implemented To Store JobRepository in Any Arbitary Store ("is duplicated by")
- BATCH-1596 Support for NoSQL database persistence
Hey, guys. I and some friends could be interested in implementing this feature. If so, does anyone on the Spring Batch team can help us with some hints?
@rcardin Thank you for your interest to help! You would need to implement 4 DAO interfaces (JobInstanceDao, JobExecutionDao, StepExecutionDao, ExecutionContextDao) and a factory bean that extends AbstractJobRepositoryFactoryBean. The abstract factory bean already takes care of creating a transactional proxy (that would be based on a MongoTransactionManager) around a SimpleJobRepository with the 4 DAOs. I have created the initial stubs here while evaluating another issue (Point 5 of #3942). For the implementation logic of the DAOs, you can get inspiration from the JDBC -based ones.
You are welcome to contribute if you want. I would love to help if you need support on this!
Hey @benas. First, thanks for pointing us to your stubs. They're handy. We started to look at the code, and we noticed immediately that the main business classes, such as JobExecution and JobInstance, extend a class called Entity. This class forces us to have a Long for every object.
However, as you know, Mongo doesn't generate ids of type Long, and there is no simple way to generate an autoincremented identifier. I suppose this fact is strictly related to the fact that the business models and the persistent models are collapsed into the same classes.
Do you suggest having a conservative approach, trying to overcome the problem of the generated unique identifier with some trick like these, or to have a more disruptive approach, reviewing the models?
I think you are hitting an issue similar to https://github.com/spring-projects/spring-batch/issues/1317. The implications of this choice touch not only the domain model (which is designed around sequences), but also other core parts of the framework, like:
- All APIs related to these entities are built around
Long(See input parameters and return types of methods likeJobExplorer#getJobExecution,JobOperator#getRunningExecutions, etc) - The logic of getting the last job/step executions (used in restarts) is based on max(ID)
- All JDBC DAOs use a
DataFieldMaxValueIncrementerfrom Spring Framework which is designed to increment the data store field's maximum value. Incrementing a value implies using a sequence, even if values are of typeStringfor example. The point here is rather about the fact that ordering is required for the previous point to work - JSR-352 APIs use the
longtype for IDs. While we are planning to deprecate the support for JSR-352 (in #3894), we need to make sure the version that introduces the required changes here is compatible with the JSR (or delay this feature to a version in which the JSR implementation is removed).
As you can see, the impact of changing the type of Entity ID is quite substantial. While I'm not against reviewing the model, I would like to take time to evaluate such change (or see it in a fork). That said, I think trying to mimic sequences with Mongo's counter documents might work here (using a single document for all the 3 sequences won't work, but using 3 separate documents or even separate collections, one for each sequence might work). Do you want to try the conservative approach with a quick prototype? If we find any blockers, we can consider the disruptive approach. What do you think?
@benas, thanks for the detailed response. We completely agree with you. We will implement a prototype that mimics sequences using Mongo features.
@rcardin I managed to create a PoC based on counter documents as described in https://www.mongodb.com/blog/post/generating-globally-unique-identifiers-for-use-with-mongodb , section: "Use a single counter document to generate unique identifiers one at a time" [*].
While this avoided the need to change the type of entities to something other than Long, I noticed that the current domain model is not suitable to be persisted in a non-relational data store (due to the lack of default constructors, presence of circular dependencies like job execution <-> step execution, etc). Therefore, I believe we need a persistence model suitable for such a target store. The persistence model does not have to be the same as the domain model, and could be designed from the ground up for persistence:
- Provide default constructors + getters/setter suitable for persistence (Java records are a good option)
- No circular dependencies
- For non-relational databases, the model does not have to be normalized. In fact, the job parameters and the execution context could be embedded in enclosing documents ( => no unnecessary and expensive table joins!)
- Not all domain types should have an equivalent in the persistence model (like
JobParameters)
The persistence model in the aforementioned PoC tries to cover all these points and was designed to be usable by other non-relational solutions (ie no MongoDB specific annotations (like @Transient) or APIs (like ObjectId)).
With that in place, we now need a way to convert entities from the domain model to the persistence model and vice-versa, without impacting the framework's logic. This is also done in the PoC, see the converter package.
That said, and while the PoC seems to work, I think the "disruptive approach" (ie changing the Entity ID type to something other than Long, update the ordering logic of entities based on creation date to remove the need for sequences) should work as well. I explained this here. But this is for another discussion, and definitely for a major Spring Batch version (if this option is retained).
If you are interested, I would be grateful if you could give the experimental feature a try and share your feedback! Thank you upfront.
[*]: the potential contention drawback mentioned in the disclaimer does not really apply to Spring Batch, based on the frequency in which batch jobs are typically launched.
Hi everyone,
are there any plans to take the PoC by @fmbenhassine into the main spring batch projects soon? We would be interested to use this in production.
Hey, we also would be interested in the MongoDB support for JobRepository.
@fmbenhassine are there any plans or timelines when to bring it live?