linkset

Wednesday, May 15, 2013

Some basics facts about MongoDB


  • It is a Document database
  • It is using a non-relational data model
  • You can save JSON documents to your collections in a MongoDB
  • Data will be store as BSON (binary version of JSON)
  • MongoDB provides many tools to query,import,export,dump,restore etc.
  • Its also provides language drivers for most of available programming languages.
  • MongoDB provides high availability and reliability using Replica sets (two or more MongoDB instances which one instance act as Primary and others as Secondaries)
  • Sharding is MongoDB's solution for the horizontal scalebility . you can distribute any of your collection   which in a sharding enabled mongodb instance throughout a mongo cluster
  • When sharding a collection you needs to provide an indexed field as the Shard Key
  • MongoDB support indexing

Tuesday, May 7, 2013

Horizontal vs Vertical scaling
















Vertical Scaling :- adding more resources(CPU/RAM/DISK) to your storage engine.

Horizontal Scaling:- adding more processing units(machines) to your database.

what is MapReduce?

MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers.

Mapping :- Apply the logic(do the calculation) on subsets of the target data stored in one or more data nodes in a cluster.

Reducing :- This is a technique use to generate the final output by using the outputs of Mapping.

The simple idea behind this concept is, process the data at where the data located  to provide more processing power to our databases.

Some more facts about NoSQL databases

Pros
  • Mostly open source.
  • Horizontal scalability. 
  • There’s no need for complex joins and data can be easily sharded? and processed in parallel.
  • Support for Map/Reduce. This is a simple paradigm that allows for scaling computation on cluster of computing nodes.
  • No need to develop fine-grained data model – it saves development time.
  • Easy to use.
  • Very fast for adding new data and for simple operations/queries.
  • No need to make significant changes in code when data structure is modified.
  • Ability to store complex data types (for document based solutions) in a single item of storage.
Cons
  • Immaturity. Still lots of rough edges.
  • Possible database administration issues. NoSQL often sacrifices features that are present in SQL solutions “by default” for the sake of performance.
  • No indexing support (Some solutions like MongoDB have indexing but it’s not as powerful as in SQL solutions).
  • Bad reporting performance..
  • Absence of standardization. No standard APIs or query language. It means that migration to a solution from different vendor is more costly. Also there are no standard tools (e.g. for reporting)

NoSQL Vs Big Data



Data is becoming easier to capture and access through third parties such as Facebook, D&B, and others. Personal user information, geo location data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the ever-expanding array of data being captured. It’s not surprising that developers want to enrich existing applications and create new ones made possible by it. And the use of the data is rapidly changing the nature of communication, shopping, advertising, entertainment, and relationship management. Apps that don’t leverage it quickly will quickly fall behind.

How NoSQL Databases resolving Big Data issue ?
  • will scale to your traffic at an acceptable cost.
  • Have no fixed schemas and allow schema migration without downtime , supports dynamic schema's  and unstructured data.
  • Easy to use in conventional load-balanced clusters which provides high availability and great reliability.
  • NoSQL can address OS and hardware limitations in RDBMS solutions with its horizontal scalability 



Types of NoSQL Databases

Since "NoSQL" just means non-relational and not SQL, there are many different ways to implement NoSQL technology. Generally, NoSQL databases include the following families

  • Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name, or key, together with its value. Examples-:Oracle BDB,AmazonSimpleDB,Riak
  • Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.  Examples -: MongoDB,CouchDB
  • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
  • Graph stores are used to store information about networks, such as social connections.    Examples –: Graph stores include Neo4J and HyperGraphDB.

What is a NoSQL database ?

  • Database provides a mechanism for storage and retrieval of data that use looser consistency models than traditional relational databases in order to achieve horizontal scaling and higher availability.
  • NoSQL = No SQL = No Structured Query Language (some NoSQL databases like hadoop now provides tools like hive which can run SQL smiler queries)

Monday, May 6, 2013

when to choose NoSQL ?


I’ve listed some use cases where it is optimal to use a NoSQL databases below

  • Your relational database will not scale to your traffic at an acceptable cost
  • Your data is supplied in small updates spread over time so the number of tables required to maintain a normal form has grown disproportionally to the data being held. Informally if you can no longer print your ERD on an A3 piece of paper you may have hit this problem or you are storing too much in a single database.
  • Your business model generates a lot of temporary data that does not really belong in the main data store. Common examples include shopping carts, retained searches, site personalisation and incomplete user questionnaires.
  • Your relational database has already been denormalised for reasons of performance or for convenience in manipulating the data in your application.
  • Your dataset consists of large quantities of text or images and the column definition is simply a Large Object (CLOB or BLOB).
  • You need to run queries against your data that do not involve simple hierarchical relations; common examples are recommendations or business intelligence questions that involve an absence of data. 
  • You have local data transactions that do not have to be very durable. For example "liking" items on websites: creating transactions for these kind of interactions are overkill because if the action fails the us
  • Easy to use in conventional load-balanced clusters
  • Persistent data (not just caches)
  • Scale to available memory
  • Have no fixed schemas and allow schema migration without downtime
  • Have individual query systems rather than using a standard query language
  • Are ACID within a node of the cluster and eventually consistent across the cluster