NoSQL mit MongoDB

Eine Einführung für Entwickler

Hardy Erlinger
hardy.erlinger@netspectrum.de

www.netspectrum.de

NoSQL Databases Overview

NoSQL: “Not only SQL”
  • Non-relational
  • Schema-free
  • Distributed (easy replication)
  • Eventually consistent (no ACID support)
  • Built for Big Data
  • Run on commodity hardware

Key-Value Databases

Document Databases

Column-Family Stores

Graph Databases

And more ...

  • Multimodel Databases
  • Object Databases
  • XML Databases
  • Grid & Cloud Database Solutions
  • ...
  • ...
  • ...

MongoDB

  • From "huMONGOus"
  • Open source document database (9 million downloads)
  • Records are schema-less
  • High availability (replica sets)
  • Automatic scaling (sharding)
  • Supported platforms: Windows, Linux, Mac OS X & Solaris
  • Client drivers for many languages (C, C++, C#, Java, Node.js, Perl, PHP, Python, Ruby etc.)

Licensing

A Note About Security ...

MongoDB Data Model

Relational DB MongoDB
Database server MongoDB instance
Database Database
Table Collection
Row Document

Document

  • JSON-style key-value pairs with dynamic schema
  • Basic unit of data in MongoDB
  • Maximum size: 16 MB

Source: MongoDB Documentation

BSON

  • MongoDB's serialization format
  • "Binary JSON", extends JSON with additional types, such as Double, 32-bit & 64-bit Integer, Binary data, ObjectId, Date, Timestamp etc.

The role of "_id"

  • All documents must have a unique key named "_id"
  • Can be any type, as long as the value is unique
  • If the client doesn't provide _id for a document, the server adds it automatically (type: ObjectId)

{
    "_id" : ObjectId("54cce0151e98e91368b54a17"),
    "productDetails" : {
        "description" : "Lorem ipsum ...",
        "productType" : "A"
    },
    //[...]
}
                

Collection

  • Groups documents with similar purpose
  • Created on insert of first document
  • Schema is not enforced, documents can have different fields
  • Indexes are applied to collections

Source: MongoDB Documentation

Database

  • Purpose: Container for collections
  • Created on first use

Demo: MongoDB Data Model and the mongo shell

CRUD in MonoDB

  • All operations target a single collection (no "JOIN" queries exist)
  • Updates on a single document are guaranteed to be atomic (includes sub-documents)

Inserting documents

Source: MongoDB Documentation

Queries

Source: MongoDB Documentation

Projections

Source: MongoDB Documentation

Demo: CRUD with the MongoDB C# Driver

Indexes

  • Indexes are assigned to collections
  • _id is indexed automatically
  • Secondary indexes can be created on arbitrary fields, including arrays
  • Command:
    db.mycoll.ensureIndex({fieldName:1})
  • Regularly inspect the output of
    db.mycoll.find({...}).explain()

Data Modeling Considerations

  • Application access patterns drive schema design
  • To embed, or to reference? That is the question ...
  • Maximum document size: 16 MB

Embedding Documents

Source: MongoDB Documentation

Referencing Documents

Source: MongoDB Documentation

The final answer to data modeling

It depends ...

Data Modeling Resources

Aggregation

  • Purpose: process data records in a pipeline and return computed results
  • Input: collection, output: one or more documents
  • Special pipeline operators for filtering, grouping, transforming and calculating and output.

Aggregation: Sample Query

Get states with populations above 10 million


// sample doc from the "zipcodes" collection
{
    "_id": "10280",
    "city": "NEW YORK",
    "state": "NY",
    "pop": 5574,
    "loc": [
        -74.016323,
        40.710537
    ]
}
                

// Get states with populations above 10 million, sort by population (desc)
db.zipcodes.aggregate(
    { $group : { _id : "$state", totalPop : { $sum : "$pop" } } },
    { $match : { totalPop : { $gte : 10*1000*1000 } } },
    { $sort  : { totalPop: -1 } }
)
               

Query Result


{
    "result" : [
        {
            "_id" : "CA",
            "totalPop" : 29760021
        },
        {
            "_id" : "NY",
            "totalPop" : 17990455
        },
        {
            "_id" : "TX",
            "totalPop" : 16986510
        },
        {
            "_id" : "FL",
            "totalPop" : 12937926
        },
        {
            "_id" : "PA",
            "totalPop" : 11881643
        },
        {
            "_id" : "IL",
            "totalPop" : 11430602
        },
        {
            "_id" : "OH",
            "totalPop" : 10847115
        }
    ],
    "ok" : 1
}
                

Special Index & Collection Types

  • Capped Collection: fixed size, automatically removes the oldest documents
  • TTL Index: automatically expires documents after a certain time
  • Full Text Index: supports 15 languages, stemming & stop words
  • Geospatial Index: spherical & flat geospatial queries (2D)
  • Grid FS: store large binary data

Replication

Source: MongoDB Documentation

Sharding

Source: MongoDB Documentation

When *not* to use MongoDB

  • Multi-document transactions
  • Complex nested relationships (social networks etc.)
  • No prior experience with schema-less datastores

Resources