orientdb-labs [OEP 13] Allow full NoSQL

Summary: Currently ODB supports some NoSQL attributes, like storing schemaless data. These attributes could be enhanced to get ODB to be a pure NoSQL database - but with relationships, which would make it one of the best NoSQL solutions on the market!!!

Goals:

Create a system, where data defining SQL is not necessary to create schema within ODB.

I know that sounds contradictory, but humor me on this. The whole idea is to avoid needing data/ schema defining language (DDL) from the user's point of view.

Non-Goals: None currently.

Success metrics: None known currently.

Motivation: When programming, working with RDBMSes can be a PITA, especially when schema needs to be changed for new features. This means that schema and even databases must be versioned to match the current version of an application. For testing, schema must also be migrated first, before data can be stored and tests can run.

Databases like MongoDB have circumvented this issue by removing the need to first create the schema, to then store data. As they put it, "the code is the schema!"

This important paradigm shift is what makes true NoSQL databases really valuable from a programmer's perspective. They also allow the programmer to avoid the ORIM, Object Relational Impedance Mismatch.

Description: Currently, to load data, the schema must first be created. For instance, this query

INSERT INTO user SET name='bob'

ends with this error

because no schema was defined earlier for the user class.

MongoDB would happily take this (formed in MongoDB's own query language) and automatically create both the user collection and the name attribute as a string and save the value "bob" in it.

This same kind of "automatic schema creation" is what ODB needs too.

Alternatives: None known currently, because NO other NoSQL DBs do relationships (well). That means this paradigm addition would be a USP for ODB!

Risks and assumptions:

This pure NoSQL system would be a parallel feature to ODB's current schema creation methods. That means anyone who wants to use full-schema definitions still can.

This suggestion might mean any initial insert with non-existent schema could be slower, because schema definitions, mainly needed for ODB's internal workings, would need to be created (automatically through ODB in the background).

This paradigm change/ addition for ODB could also mean some of the types supported by ODB would not be supported under the pure NoSQL system. That would be a trade-off to use it. Any type must be defined through its literal value. For instance

@rid = #12:1234 // would have to be a valid rid and is created by ODB, as usual. name = 'bob' // would be a string some-number = 1234 // would be an integer another-number = 0.1234 // would be a long

etc.

There would be an issue with knowing whether or not to create Vertexes or Edges. This could be easily solved, as in this example for a Vertex.

INSERT INTO V.user SET name='bob'

And for an Edge.

INSERT INTO E.is_owner FROM #12:1234 TO #21:2456

The same dot notation could also be used for inheritance too.

INSERT INTO V.animals.dogs SET name='wuffy'

Since ODB would actually be creating schema in the background, the normal indexing features can still be used.

Schema removal/ alteration DDL would still be necessary for corrections. i.e. DELETE, DROP and ALTER.

Impact matrix

[ ] Storage engine
[x ] SQL
[ ] Protocols
[ ] Indexes
[ ] Console
[ ] Java API
[ ] Geospatial
[ ] Lucene
[ ] Security
[ ] Hooks
[ ] EE

NOTE: These blog articles from MongoDB explain the paradigm differences a lot better.

MongoDB vs SQL: Day 1-2 MongoDB vs SQL: Day 3-5 MongoDB vs SQL: Day 14

And this is my last OEP. 😄

Sep 24 '16 10:09 smolinari

This article also explains in fair detail the issues that need to be avoided to make application developers lives much easier.

http://martinfowler.com/articles/evodb.html

If ODB could "get out of the way" of developers as much as possible, that will make it their favorite. And again, this is a main goal of MongoDB's design and why it is so popular.

Scott

Sep 28 '16 15:09 smolinari

Just linking my other suggestion to this one, as they are relevant. [Suggestion] - Allow for indexing on properties not defined with schema #6445

Scott

Nov 07 '16 12:11 smolinari

Here is an article from a developer also arguing for "preferably having schema in the code".

https://medium.com/capital-one-developers/nosql-database-doesnt-mean-no-schema-a824d591034e#.h4ztacria

There is no doubt, that a schema is needed for any application's data. However, it shouldn't be needed within ODB, if ODB wants to call itself a true NoSQL solution.

Scott

Feb 08 '17 15:02 smolinari

Hello ODB team!

Is this suggestion here being considered at all?

I'd really like to see ODB automate schema (the internal metadata) creation. That is the missing feature for ODB to blast past other relational databases and to link it properly to the NoSQL world. By automatic schema metadata creation I mean, if I insert a property in a class, where that property or class hasn't yet been created via schema (like CREATE CLASS or CREATE PROPERTY), then ODB just creates the schema metadata inside ODB in the background for me. The data types can be inferred.

The problem this solves is, now I can also create indexes on any new fields created by inserting them only. I won't ever need to keep a schema version and I avoid the chores of schema evolution tracking. My code becomes my schema. This is what makes Mongo so popular. As they put it, the database is pushed more into the background.

Please consider it. Thanks!

Scott

p.s. or allow indexing on data missing schema. I think the first suggestion is easier to do though. 😄

May 25 '17 07:05 smolinari

One last question.

How bad would it affect ODB's performance and how hard would it be to include:

A check of the schema metadata, to see if a class and/or property are already part of the schema when there is an INSERT of data?
If the above check is false, to create the schema through type inference automatically?

In other words, if I do this:

INSERT INTO V.user SET name='bob'

ODB would check the schema for the vertex class "user" and if it isn't there, create the class. Then create the property schema for "name" as a string and insert "bob" into a vertex.

If the vertex class "user" is already in the schema, only the check for the property "name". If it isn't there add it. Otherwise, just add the value to a new vertex.

OR.....

Could it be possible to allow for indexing of non-schema properties? In other words, loosen up the indexing rules. If a database doesn't have the schema or a record doesn't have the property trying to be indexed, then just ignore the index command or give the user an error (but still index, should other records have the property).

In the end, the goal is to allow the schema to be left in the code and not in the database, if the user wants that.

That is true NoSQL and until this happens, ODB cannot be considered a true NoSQL solution.

Scott

May 29 '17 06:05 smolinari

Ping to @luigidellaquila and @lvca for an answer to the above. What got me going on this again was this post, which you all linked to in Twitter.

ODB cannot seriously fulfill point 4 of that post, due to it's lack of ability to index non-schema properties. That needs to change and this suggestion offers two ways.

Automated schema creation

or

Indexing on non-schema properties

I really hope this is making sense to you. If it isn't, you haven't understood the NoSQL paradigm, which is "the schema is in the code".

Scott

May 29 '17 06:05 smolinari

Hi Scott,

The first solution is not feasible for now, for a couple of reasons:

you cannot infer property type from a single value, as other documents coming further could have the same property name with different values
right now you cannot manipulate the schema in a transaction

The second solution could work actually, the high level API does not support it, but the low level has no limitations in this sense. We could consider supporting it, but we have to check if it can have problems with keys of different types

Thanks

Luigi

May 30 '17 13:05 luigidellaquila

Hmm.. interesting point Luigi about the differing value types for the same property name. Obviously that would be stupid to do, even for a "schemaless" system, but I still wonder how MongoDB handles that scenario. I'll go looking, just to satisfy my own curiosity.

If you could make indexing possible without needing the schema metadata, that would be the solution. Then, only the class creation without a CLASS CREATE would be left to make an ODB NoSQL reality. Then the schema could truly only be in the code! 😄 Because, really, we aren't talking about schemaless data. We are talking about controlling schema outside of the database. Doing this basically removes the necessity for versioning database changes and migrations.

Do you see the value in that?

Scott

May 30 '17 13:05 smolinari

Ok. I've gotten my answer as to how MongoDB handles the situation and, it simply works. MongoDB allows for different types with the same field name in different documents and also allows for indexing on that single field name. The caveat is, if you inserted a string where an integer should be and query for an integer, you'll never get a result for the string. You'd have to also query for strings.

So, again. MongoDB does it's best to stay out of the way of the developer. Or put another way, it leaves schema control completely up to the developer.

Scott

May 31 '17 05:05 smolinari

orientdb-labs orientdb-labs copied to clipboard

[OEP 13] Allow full NoSQL

orientdb-labs
orientdb-labs copied to clipboard