Rules of data conversion from document to relational databases

data: 9 czerwca, 2014
czas czytania: 10 min
autor: Wojciech Milczarek

This article is hypothetical divagation how to determine and use a set of rules to create relational schema based on MongoDB data. Using only step by step analysis without creating relational model from the very beginning, rules that emerge from that understanding should be simple.

1. Quick Introduction to Data Model in Mongo DB

1.1. Basic Concepts

Table presents the various SQL terminology and concepts as well as the corresponding MongoDB terminology and concepts.

[table id=7 /]

1.2. Data model in MongoDB

Data in MongoDB are schema-less. Document collections do not enforce document structure, and further documents, representing the same entity, can have substantial variation in their fields.
There are two main ways to define relationship between documents:

References Types:

  • Linking Reference
Picture 1.

Picture 1.

Field references store references which are ids to another document.

  • Embedded documents

Another way to associate documents is to embed them. Parent document has a child document defined as field.

Picture 2

Picture 2

2. Document References Translation

2.1. Translation of a One-to-One model

Simplest One-to-One reference between MongoDB documents.

Patron Collection
{
   _id: "joe",
   name: "Joe Bookreader";
}
Address Collection
{
   patron_id: "joe",
   street: "123 Fake Street",
   city: "Faketon",
   state: "MA",
   zip: "12345";
}

Code list no.1.

Both models physically represents the same data, only reference type which associates these data is different.

Patron Collection
{
   _id: "joe",
   name: "Joe Bookreader",
   address: {
              street: "123 Fake Street",
              city: "Faketon",
              state: "MA",
              zip: "12345";
            }
}

Code list no.2.

Code lists no. 1 & 2 present JSON written example of a document model. The first step in conversion should be to determine how many tables will be necessary to create these data in a relational model. To do so some rules have to be established.

Every single document type is a separate entity. This document type have to be defined as a document with a certain set of fields. Based on this the first rule can be formulated, let’s call it the „Unique Entity rule”

A number of tables in the relational model from a document or context will be the number of unique entities in this context. For example, in Code list no. 1. documents in the Patron collection represent one entity. Without any embedded documents their SQL form will be one table.

The next step is to identify relations with primary and foreign keys. Again, some simple rules:

Embedded document (not array) is in relation with his parent with the foreign key on parent’s entity side. (Embedded document rule)

Every entity with reference id filed of another entity is in relation with that entity with the foreign key on its side. (Linked document rule)

Analyzing Code list no. 1, linking reference can be seen between „_id” field in the Patron collection and „patron_id” field, in the Address collection. This reference will resolve in „patron_id” as a foreign key of Patron Entity in a relational model.

For the Code list no. 2, no „patron_id” field exists; a document is embedded instead. There is no explicit reference. Address is a field of Patron Entity so it can be assumed that it contains foreign key of Address Table.

This is a relational diagram representing translation from Code list no. 1

DB Diagram 1

DB Diagram 1

This is a relational diagram representing translation from Code list no. 2

DB Diagram 2

DB Diagram 2

2.2. Translation of a One-to-Many model

Book Collection

{
   title: "MongoDB: The Definitive Guide",
   author: [ "Kristina Chodorow", "Mike Dirolf" ],
   published_date: ISODate("2010-09-24"),
   pages: 216,
   language: "English",
   publisher: {
              name: "O'Reilly Media",
              founded: 1980,
              location: "CA";
            }
}
{
   title: "50 Tips and Tricks for MongoDB Developer",
   author: "Kristina Chodorow",
   published_date: ISODate("2011-05-06"),
   pages: 68,
   language: "English",
   publisher: {
              name: "O'Reilly Media",
              founded: 1980,
              location: "CA";
            }
}

Code list no. 3.

There are two distinct document entities in this Code list. Furthermore, as in the Code list no. 3 references between them are embedded. That means that parent entity should define foreign key of embedded document in the relational model. For now, I will skip the „author” field in my divagations.

DB Diagram 3

DB Diagram 3

In DB Diagram 3 the relation is rather straightforward.

Publisher Collection
{
   name: "O'Reilly Media",
   founded: 1980,
   location: "CA",
   books: [12346789, 234567890, ...]
}

Book Collection
{
    _id: 123456789,
    title: "MongoDB: The Definitive Guide",
    author: [ "Kristina Chodorow", "Mike Dirolf" ],
    published_date: ISODate("2010-09-24"),
    pages: 216,
    language: "English";
}

{
   _id: 234567890,
   title: "50 Tips and Tricks for MongoDB Developer",
   author: "Kristina Chodorow",
   published_date: ISODate("2011-05-06"),
   pages: 68,
   language: "English";
}

Code list no. 4.

This Code listing is another variation of physically the same data shown in the Code list no. 3. This model represents another one-to-many way to create relations in MongoDB,
This time type array has moved into action, in „books” field which contains a list of related documents ids. The Array is treated as an embedded document, but it forces many-to-one relation. In consequence:

Every child array is in relation to the parent document with FK on child’s side. If array contains reference values to another document, it is in relation with him with the FK on his side. (Embedded array rule)

DB Diagram 4

DB Diagram 4

Three Entities where created. The one that represents the Book Collection, the one that represents the Publisher Collection, and the embedded inside publisher „books” field entity. The „books” document uses both embedded (to its parent) and linking (to book entities) reference types. In this case it means that schema is getting an unwanted proxy table that should be normalized.

After converting to relational model it is important to check if the current form is normalized. (See DB Diagram 5)

Publisher Collection
{
   _id: "oreilly",
   name: "O'Reilly Media",
   founded: 1980,
   location: "CA";
}

Books Collection
{
   _id: 123456789,
   title: "MongoDB: The Definitive Guide",
   author: [ "Kristina Chodorow", "Mike Dirolf" ],
   published_date: ISODate("2010-09-24"),
   pages: 216,
   language: "English",
   publisher_id: "oreilly";
}

{
   _id: 234567890,
   title: "50 Tips and Tricks for MongoDB Developer",
   author: "Kristina Chodorow",
   published_date: ISODate("2011-05-06"),
   pages: 68,
   language: "English",
   publisher_id: "oreilly";
}

Code list no.5.

In this example, reference side has been switched. Now the Book Entity has the reference to the Publisher. Thanks to this the relational schema will be normalized from the very beginning.

DB Diagram 5

DB Diagram 5

2.3. Translation of a Many-To-Many model

Depending on data, many too many relations can be found in type array. If multiple entities in the collection contain the same field array and the values of this table are repeated in other fields of the array in the collection, it is a clear indicator of a many to many relations.

When it is determined that the table creates a document many to many relation, then the intersection table is created only when the array values are not a link references to other documents (Array many-to-many rule).

Looking back to the Code list no. 5 the „author” field has different type each time. One time it is string, and another time it has array type. As a result another rule is drawn up:

Each time a field is, at least once, a type array or embedded document, it should be always considered as of this type. That means it should be a separate entity. (Array field rule)

In this case there are many-to-many relation books-authors, that’s why conversion will require an intersection table. Because of the fact that arrays are fields they don’t have „_id”. In addition, intersection table doesn’t necessary need them. Diagram no. 6 will be normalized from the beginning.

DB Diagram 6

DB Diagram 6

Student Collection
{
   _id: ObjectId("507f191e810c19729de860e1"),
   firstName: "Susan",
   lastName: "Johnson",
   lecture: [
	ObjectId("507f191e810c19729de860e3")
	ObjectId("507f191e810c19729de860e4")]
}

{
   _id: ObjectId("507f191e810c19729de860e2"),
   firstName: "Philip",
   lastName: "Fry",
   lecture: [
	ObjectId("507f191e810c19729de860e3")
	ObjectId("507f191e810c19729de860e4")]
}

Lecture Collection
{
   _id: ObjectId("507f191e810c19729de860e3"),
   className: "Digital modeling",
   teacher: "John Smith",
   student: [
	ObjectId("507f191e810c19729de860e1")
	ObjectId("507f191e810c19729de860e2")]
}

{
   _id: ObjectId("507f191e810c19729de860e4"),
   className: "Software Engineering",
   teacher: "Adam Carter",
   student: [
	ObjectId("507f191e810c19729de860e1")
	ObjectId("507f191e810c19729de860e2")]
}

Code list no. 6.

In this case we can see different approaches of a many-to-many relation. Both, Lecture and Student collection have an array of references to each other. Here, the many-to-many array rule would apply. It is cause by the fact that a great number of the same array ObjectId values are in the same entities. When array is a part of many-to-many relation, the intersection table is created only when a stored value is not a link reference to another document.

Two extra arrays create two extra tables, which in reference will link them to the opposite collection creating in this process a denormalized schema.

DB Diagram 7

DB Diagram 7

DB Diagram 7 requires normalization. Below we can see the results.

DB Diagram 8

DB Diagram 8

3. MongoDB fields type conversion to relational (MySQL) Columns

3.1. Simple type conversion

Here, the issue of how MongoDB field types may translate to which column type based on a MySQL example.

[table id=8 /]

MongoDB does not limit the size of string, it only limits the size of the document to 16 MB. It would be wise, basing on domain or client specification, to put limits on converted SQL Columns.

3.2. Multi type fields in one Entity

Considering MongoDB is schema-less, one field in certain collections can have different stored types, but still be a part of one unique entity. That means it would be preferable that it stays this way on relational side. When there are multiple types of one field in a document, relational form should consider every type as a separate column (Multiply field rule). Except the situation when Array or embedded document occurs, then this rule is suppressed by Array field rule. Acceptable but potentially unwanted is to create a field of type that will facilitates conversion of various data into it.

4. List of rules

1. Unique Entity rule – Number of tables in the relational model from context will be the number of unique entities in this context.

2. Embedded document rule – Embedded document is in relation with his parent with the foreign key on side of parent Entity.

3. Embedded Array rule – Embedded array is in relation with his parent with the foreign key on its side.

4. Linked document rule – Every Entity with reference id filed of another entity is in relation with that entity with the foreign key on its side.

5. Array field rule – If a field is a type Array or Embedded document at least once, it should be always considered as of this type. That means it should be a separate entity.

6. Multiply field rule – When in a document there are multiple types of one field, relational form should consider every type as a separate field.

7. Normalization rule – After converting to relational model it is important to check if the current form is normalized.

8. Array many-to-many rule – If it is determined that array creates a many-to-many relation with the document, intersection table is created only when stored value is not a link reference to the another document.

This set of rules concludes, hypothetical conversion of document database to relational model.

Sources:
http://docs.mongodb.org/manual/
http://dev.mysql.com/doc/
http://www.tutorialspoint.com/mongodb/mongodb_datatype.htm

Newsletter IT leaks

Dzielimy się inspiracjami i nowinkami z branży IT. Szanujemy Twój czas - obiecujemy nie spamować i wysyłać wiadomości raz na dwa miesiące.

Subscribe to our newsletter

Administratorem Twoich danych osobowych jest Future Processing S.A. z siedzibą w Gliwicach. Twoje dane będziemy przetwarzać w celu przesyłania cyklicznego newslettera dot. branży IT. W każdej chwili możesz się wypisać lub edytować swoje dane. Więcej informacji znajdziesz w naszej polityce prywatności.

Subscribe to our newsletter

Administratorem Twoich danych osobowych jest Future Processing S.A. z siedzibą w Gliwicach. Twoje dane będziemy przetwarzać w celu przesyłania cyklicznego newslettera dot. branży IT. W każdej chwili możesz się wypisać lub edytować swoje dane. Więcej informacji znajdziesz w naszej polityce prywatności.