Wednesday, 6 August 2014

Mongo DB for Java Developers

I have just enrolled the Mongo DB for Java Developers course from 10gen, which has started yesterday.

It's a 7 week course as the course goes on, I will keep a note of the important points during the lecture for each week.

WEEK1

MongoDB -

  • Non-Relational JSON Document Store
  • NO SQL
  • Dynamic Schema [Schema less] - different documents have different schema
  • It's a Document oriented DB
  • Doesn't support Joins[as they are not particularly horizontally scalable], Transactions across multiple collections



mongod - is the primary database process that runs on an individual server.
mongo binary provides an administrative shell, which connects to MongoDB through TCP.
Mongo Java - is the driver which communicates MongoDB through TCP

Basic MongoDB commands, which needs to executed in Mongo Shell:

> show collections
> db.createCollection("hello", {size:2147483648})

{ "ok" : 1 }

To insert an document
> db.hello.insert({name : "Prashanth"})
WriteResult({ "nInserted" : 1 })

To find the record
> db.hello.find()
{ "_id" : ObjectId("53e35a06581079908231543e"), "name" : "Prashanth" }

Here's the MongoDB 2.6 Manual for detailed commands.

As part of the course, I have started using Java 8.
Here's the code to integrate MongoDB, Freemarker, and Spark all together.

package com.tengen;

import com.mongodb.*;
import freemarker.template.Configuration;
import freemarker.template.Template;
import spark.Spark;

import java.io.StringWriter;
import java.net.UnknownHostException;
import java.util.HashMap;
import java.util.Map;

import static spark.Spark.halt;

/**
 * Created by pupsPrashuVishnu on 08/08/14.
 */
public class HelloWorldMongoDBSparkFreeMarkerStyle {

    public static void main(String[] args) throws UnknownHostException {
        final Configuration configuration = new Configuration();
        configuration.setClassForTemplateLoading(HelloWorldSparkFreeMarkerStyle.class, "/");

        MongoClient client = new MongoClient(new ServerAddress("localhost",27017));
        //client - a logical connection to Mongo cluster

        DB database = client.getDB("course");
        DBCollection dbCollection = database.getCollection("hello");

        Spark.get("/", (request, response) -> {
        StringWriter writer = new StringWriter();
        try {
            Template template = configuration.getTemplate("hello.ftl");
            DBObject document = dbCollection.findOne();
            template.process(document, writer);
            } catch (Exception e) {
                halt(500);
                e.printStackTrace();
            }
            return writer;
        });
    }
}

JSON Structure


Examples:
{"fruit" : ["apple","pear","peach"]}  - JSON for a simple document containing a single key "fruit" that has as its value an array containing three strings: "apple", "pear", and "peach"

{"address":{"street_address" : "23 Elm Drive", "city":"Palo Alto", "state":"California", "zipcode":"94305"} }
 JSON document with a single key, "address" that has as it value another document with the keys “street_address”, “city”, “state”, “zipcode”

We are into Week2 now, which mainly concentrated on CURD operations, MongoDB shell, manipulate MongoDB documents from Java.

 Ex to find one document where the key username is "dwight", and retrieve only the key named email.
db.users.findOne ( { "username" : "dwight" } , { "_id" : false , "email" : true } );

find all documents with type: essay and score: 50 and only retrieve the student field
db.scores.find({ type:"essay", score:50}, {student:true,_id:false});

find all documents with a score between 50 and 60, inclusive
db.scores.find({ score : { $gte : 50 , $lte : 60 } } )

find all users with name between "F" and "Q"
db.users.find( { name : { $gte : "F" , $lte : "Q" } } );

retrieves documents from a users collection where the name has a "q" in it, and the document has an email field.
db.users.find( { name : {$regex:"q"}, email : {$exists: true} } )

find all documents in the scores collection where the score is less than 50 or greater than 90
db.scores.find( { $or : [ { score : { $lt : 50 } }, { score : { $gt : 90 } } ] } ) ;

Find all documents with score less than 60
db.scores.find( { score : { $gt : 50 }, score : { $lt : 60 } } );
--Javascript replaces the first occurance with the secondone.
In this case we have to use $and
db.scores.find( { $and : [ { score : { $gt : 50 } }, { score : { $lt : 60 } } ] } ) ;

$all - should contain all of the values and take precedence of the order  in which they appear
$in - should contain atleast one of the values
db.users.find( { friends : { $all : [ "Joe" , "Bob" ] }, favorites : { $in : [ "running" , "pickles" ] } } )

Dot Notation
Here's the sample product catalog
{ product : "Super Duper-o-phonic", 
  price : 100000000000,
  reviews : [ { user : "fred", comment : "Great!" , rating : 5 },
              { user : "tom" , comment : "I agree with Fred, somewhat!" , rating : 4 } ],
  ... }
Finds all the products that cost more than 10,000 and that have a rating of 5 or better
db.catalog.find( { "price": {$gt : 10000}, "reviews.rating" : {$gte : 5} } )

Querying Cursors
{
 "_id" : ObjectId("50844162cb4cf4564b4694f8"),
 "student" : 0,
 "type" : "exam",
 "score" : 75
}
Query that retrieves exam documents, sorted by score in descending order, skipping the first 50 and showing only the next 20.
db.scores.find({type:"exam"}).sort({score : -1}).skip(50).limit(20)

Count the documents in the scores collection where the type was "essay" and the score was greater than 90
db.scores.count({type:"essay", score: {$gt : 90}})

db.users.update({"_id":"myrnarackham"},{"$set": {"country" : "RU"}})
set myrnarackham's country code to "RU" but leave the rest of the document (and the rest of the collection) unchanged.
Array operations
Suppose we have a document which contains an array as follows:
{"_id" : 0, "a" : [1,2,3,4]}
  1. db.arrays.update({_id : 0}, { $set : {"a.2" : 5} } )  - Updates the element 2 with 5
  2. -push - Add element to the right-hand-side of an array                                        db.arrays.update({_id : 0},{ $push : { a : 6 }})
  3. -pop - To remove the element from the right-hand-side of an array                             db.arrays.update({_id : 0},{ $pop : { a : 1 }})
  4. -To remove the element from the left-hand-side of an array                                            db.arrays.update({_id : 0},{ $pop : { a : -1 }})
  5. -pushAll - To add multiple elements to an array                                                        db.arrays.update({_id : 0},{ $pushAll : { a : [7,8,9]}})
  6. -pull - remove an element from any position                                                      db.arrays.update({_id : 0},{ $pull : { a : 8 }})
  7. -pullAll - removes anyoccurance/anyvalue that appear in an array                         db.arrays.update({_id : 0},{ $pullAll : { a : [8,9]}})
  8. - addToSet - acts like push if the element doesn't exist otherwise it does nothing                  db.arrays.update({_id : 0},{ $addToSet : { a : 10}})

- upsert - update the record if exists or otherwise insert an new record
db.foo.update( { username : 'bar' }, { '$set' : { 'interests': [ 'cat' , 'dog' ] } } , { upsert : true } );
-multi-update [update every document with a score less than 70 an extra 20 points]
db.scores.update({"score":{$lt :70}}, {$inc:{score:20}}, {multi:true})
Remove - Each collection has a method remove
db.people.remove({}) - removes all the documents for the collection
db.people.drop()

Java Driver - Representing Documents


Here's sample code to demonstrate Insert and Find collections
 MongoClient client = new MongoClient();
        DB courseDB = client.getDB("course");

       //insertTest
        DBCollection insertCollection = courseDB.getCollection("insertTest");
        insertCollection.drop();
        DBObject doc = new BasicDBObject().append("x", 0);
        DBObject doc1 = new BasicDBObject("_id", new ObjectId()).append("x", 1);
        DBObject doc2 = new BasicDBObject().append("x",2);
        insertCollection.insert(Arrays.asList(doc,doc1,doc2));

        //Find test
        DBCollection findCollection = courseDB.getCollection("findTest");
        findCollection.drop();
  for(int i =0; i<10 data-blogger-escaped-all:="" data-blogger-escaped-basicdbobject="" data-blogger-escaped-cur="" data-blogger-escaped-cursor.close="" data-blogger-escaped-cursor.hasnext="" data-blogger-escaped-cursor="findCollection.find();" data-blogger-escaped-dbcursor="" data-blogger-escaped-dbobject="" data-blogger-escaped-finally="" data-blogger-escaped-findcollection.count="" data-blogger-escaped-findcollection.findone="" data-blogger-escaped-findcollection.insert="" data-blogger-escaped-i="" data-blogger-escaped-ind="" data-blogger-escaped-ncount:="" data-blogger-escaped-new="" data-blogger-escaped-nextint="" data-blogger-escaped-nfind="" data-blogger-escaped-one:="" data-blogger-escaped-pre="" data-blogger-escaped-random="" data-blogger-escaped-system.out.println="" data-blogger-escaped-try="" data-blogger-escaped-while="" data-blogger-escaped-x="">

Here's the code to provide criteria for the query :
 
        findCriteriaCollection.drop();
        //insert 10 documents with 2 random integers
        for (int i = 0; i<10 data-blogger-escaped-.and="" data-blogger-escaped-.append="" data-blogger-escaped-0="" data-blogger-escaped-10="" data-blogger-escaped-90="" data-blogger-escaped-:="" data-blogger-escaped-all:="" data-blogger-escaped-and="" data-blogger-escaped-append="" data-blogger-escaped-basicdbobject="" data-blogger-escaped-between="" data-blogger-escaped-builder.get="" data-blogger-escaped-builder="QueryBuilder.start(" data-blogger-escaped-can="" data-blogger-escaped-cur="" data-blogger-escaped-cursor.close="" data-blogger-escaped-cursor.hasnext="" data-blogger-escaped-cursor0="findCriteriaCollection.find(query);" data-blogger-escaped-cursor="findCriteriaCollection.find(builder.get());" data-blogger-escaped-dbcursor="" data-blogger-escaped-dbobject="" data-blogger-escaped-finally="" data-blogger-escaped-findcriteriacollection.count="" data-blogger-escaped-findcriteriacollection.insert="" data-blogger-escaped-greaterthan="" data-blogger-escaped-gt="" data-blogger-escaped-i="" data-blogger-escaped-is="" data-blogger-escaped-lessthan="" data-blogger-escaped-lt="" data-blogger-escaped-new="" data-blogger-escaped-nextint="" data-blogger-escaped-nfind="" data-blogger-escaped-or="" data-blogger-escaped-ount="" data-blogger-escaped-pre="" data-blogger-escaped-query="" data-blogger-escaped-querybuilder="" data-blogger-escaped-random="" data-blogger-escaped-system.out.println="" data-blogger-escaped-try="" data-blogger-escaped-u="" data-blogger-escaped-use="" data-blogger-escaped-where="" data-blogger-escaped-while="" data-blogger-escaped-x="" data-blogger-escaped-y="">
To exclude x,z, and _id in the output use :
DBCursor cursor0 = findCriteriaCollection.find(query,
                       new BasicDBObject("y",true).append("_id", false));

Homework 2.2

This was the good example to demonstrate the knowledge on CURD operations:
        MongoClient client = new MongoClient();
        DB db = client.getDB("students");
       DBCollection gradesCollection = db.getCollection("grades");

        DBCursor cursor = gradesCollection.find(new BasicDBObject("type", "homework")).sort(new BasicDBObject("student_id", 1).append("score", 1));
        int curStuId = -1;
         int previousStuId = -1;
        try {
        while (cursor.hasNext()) {
            DBObject doc = cursor.next();
            curStuId = (Integer)doc.get("student_id");
            if (curStuId != previousStuId){
                previousStuId = curStuId;
                //remove cur rec
                gradesCollection.remove(doc);
            }
        }
        } finally {
            cursor.close();
        }
Ok, now we are in to Week3, which mainly discuss on Schema Design


Tuesday, 5 August 2014

Java8 - Part2

I have covered most of the important topics on Lambda Expressions in my earlier post.

Here, I will be covering on Processing Data with Streams.


Processing Data with Streams

Streams lets you manipulate collections of data in a declarative way and can be processed in parallel without having to write multi-threaded code.

BEFORE (JAVA 7):
List lowCaloricDishes = new ArrayList<>();
for(Dish d: menu){
    if(d.getCalories() < 400){
       lowCaloricDishes.add(d);
     }
} //first we filter the elements, using an accumulator
List lowCaloricDishesName = new ArrayList<>();
Collections.sort(lowCaloricDishes, new Comparator() {
                                   public int compare(Dish d1, Dish d2){
                                      return Integer.compare(d1.getCalories(), d2.getCalories());
                                   }}); //then we sort the dishes, with an anonymous class
for(Dish d: lowCaloricDishes){
   lowCaloricDishesName.add(d.getName());
} //finally we process the sorted list to select the names of Dishes
AFTER (JAVA 8):
List lowCaloricDishesName =
               dishes.parallelStream()
                     .filter(d -> d.getCalories() < 400) //first we select dishes that are below 400 calories
                     .sorted(comparing(Dish::getCalories)) //then we sort them by calories
                     .map(Dish::getName) //we extract the names of these dishes
                     .collect(toList()); //we store all the names into a List

chaining Stream operations forming a Stream pipeline
Stream operations that can be connected are called intermediate operations, and operations that close a stream are called terminal operations.







To summarize, the Streams API in Java 8 lets you write code that is:
  • declarative: more concise and readable 
  • composable: more flexibility 
  • parallelizable: more performance
Stream - “a sequence of elements from a source that supports aggregate operations“

Difference between Collection and Streams
has to do with when things are computed.
A Collection is an in-memory data structure, which holds all the values that the data structure currently has—every element in the Collection has to be computed before it can be added to the Collection.
By contrast a Stream is a conceptually fixed data structure, in which elements are computed on demand or pulled from an existing source.

A Stream is like a lazily constructed Collection: values are computed when they are solicited by a consumer.
A Collection is eagerly constructed

TRAVERSABLE ONLY ONCE
Like Iterator, the Streams can only be traversed once. After that a Stream is said to be “consumed.” You can get a new Stream from the initial data source to traverse it again just like for an Iterator.
List title = Arrays.asList("Java8", "Lambdas", "In", "Action");
Stream s = title.stream();
s.forEach(System.out::println); //  prints each word in the title
s.forEach(System.out::println); // java.lang.IllegalStateException: stream has already been operated upon or closed

COLLECTIONS: EXTERNAL ITERATION
List names = new ArrayList<>();
Iterator iterator = menu.iterator();
while(iterator.hasNext()) { // iterating explicitly
     Dish d = iterator.next();
     names.add(d.getName());
}

STREAMS: INTERNAL ITERATION
List names = menu.stream()
                       .map(Dish::getName) // we parameterize map with the getName method to extract the name of a dish
                       .collect(toList()); // start executing the pipeline of operations, no iteration!

To summarize, working with Streams in general involves three steps:

  1. A data source (such as a Collection) to perform a query on.
  2. A chain of intermediate operations, which form a stream pipeline.
  3. One terminal operation, which executes the stream pipeline and produces a result.
Filtering and Slicing
1. Filtering with a predicate
List vegetarianMenu = menu.stream()
                           .filter(Dish::isVegetarian) // a method reference to check if a dish is vegetarian-friendly!
                           .collect(toList());

2. Filtering unique elements
List numbers = Arrays.asList(1, 2, 1, 3, 3, 2, 4);
    numbers.stream()
           .filter(i -> i % 2 == 0)
           .distinct()
           .forEach(System.out::println);

3. Truncating a Stream - first elements are returned up to a maximum of n.
 List dishes = menu.stream()
                         .filter(d -> d.getCalories() > 300)
                         .limit(3)
                         .collect(toList());

4. Skipping elements - discards the first n elements
 List dishes = menu.stream()
                         .filter(d -> d.getCalories() > 300)
                         .skip(2)
                         .collect(toList());
Mapping
The Stream API provides map and flatmap which is similar to a "select" statement for selecting a particular column from a table.
Streams support the method map, which takes a function as argument to transform the elements of a Stream into another form.
For example, in the code below we pass a method reference Dish::getName to the map method to extract the names of the dishes in the stream.
List dishNames = menu.stream()
                             .map(Dish::getName)
                             .collect(toList());
To find out the length of the name of each dish, we can do this by chaining another map as follows:
List dishNames = menu.stream()
                             .map(Dish::getName)
                             .map(String::length)
                             .collect(toList());
Falttening Streams

Finding and Matching

anyMatch - “is there an element in the stream matching the given predicate”
if(menu.stream().anyMatch(Dish::isVegatarian)){
      System.out.println(“The menu is (somewhat) vegetarian friendly!!”);
}
allMatch - all the elements of the stream match the given predicate.
boolean isHealthy = menu.stream()
                        .allMatch(d -> d.getCalories() < 1000);
noneMatch - ensures that no elements in the stream match the given predicate.
boolean isHealthy = menu.stream()
                        .noneMatch(d -> d.getCalories() >= 1000);
findAny - returns an arbitrary element of the current stream
The below example finds a dish which is Vegetarian
Optional dish = menu.stream()
                          .filter(Dish::isVegetarian)
                          .findAny();
The Optional class (java.util.Optional) is a container class to represent the existence or absence of a value. The below example would explicitly check the presence of a dish in the Optional object to access its name:
                menu.stream()
                    .filter(Dish::isVegetarian)
                    .findAny(); // returns an Optional
                    .map(Dish::getName) // returns an Optional, NB. This is map from Optional not map from Stream
                    .ifPresent(System.out::println); // if a value is contained it is printed otherwise nothing happens

findFirst - returns the first element of the current stream
For example, the code below, given a list of numbers, finds the first square that is divisible by 3:
 
List someNumbers = Arrays.asList(1, 2, 3, 4, 5);
Optional firstSquareDivisibleByThree = someNumbers.stream()
                                                           .map(x -> x * x)
                                                           .filter(x % 3 == 0)
                                                           .findFirst(); // 9

Here's some examples of various ways we can use this in Java 8
GIVEN : A Menu which contains a list of Dishes
        List menu = Arrays.asList(
                new Dish("pork", false, 800, Dish.Type.MEAT),
                new Dish("beef", false, 700, Dish.Type.MEAT),
                new Dish("chicken", false, 400, Dish.Type.MEAT),
                new Dish("french fries", true, 530, Dish.Type.OTHER),
                new Dish("rice", true, 350, Dish.Type.OTHER),
                new Dish("season fruit", true, 120, Dish.Type.OTHER),
                new Dish("pizza", true, 550, Dish.Type.OTHER),
                new Dish("prawns", false, 300, Dish.Type.FISH),
                new Dish("salmon", false, 450, Dish.Type.FISH));

Dish(String name, boolean vegetarian, int calories, Type type)

Count the number of Dishes:
 
menu.stream().count()
menu.stream().collect(Collectors.counting())

Finding maximum and minimum in a Stream of values
 
 Optional highCalorieDish = menu.stream()
                .collect(Collectors.maxBy(Comparator.comparingInt(Dish::getCalories)));
Summarization - the below methods are avaialble in Collectors for Summarization Collectors.summingInt Collectors.summingLong Collectors.summingDouble
 We also have equivalent for average Collectors.averagingInt/averagingLong/averagingDouble
 
int totalCalories =  menu.stream().collect(Collectors.summingInt(Dish::getCalories));
double avgCalories = menu.stream().collect(averagingInt(Dish::getCalories));

This Collector gathers all that information in a class called IntSummaryStatistics
 
IntSummaryStatistics menuStatistics = menu.stream().collect(summarizingInt(Dish::getCalories));

IntSummaryStatistics{count=9, sum=4300, min=120,average=477.777778, max=800}

Joining Strings -Collector which reduces a Stream to a single result doesn’t operate on
numeric values but allows a concatenation of Strings
 
String shortMenu = menu.stream().map(Dish::getName).collect(joining(", "));

Generalized summarization with reduction - Collectors.reducing factory method is a generalization of all of the above specialized cases.
 
int totalCalories = menu.stream().collect(reducing(0, Dish::getCalories, (Integer i, Integer j) -> i + j));

Optional mostCalorificDish =menu.stream().collect(reducing((d1, d2) -> d1.getCalories() > d2.getCalories() ? d1 : d2));

Various ways to find the total number of calories in the menu:
Using Data Processing way :
System.out.println("Using Data Processing way ...with reduce " + menu.stream().map(Dish::getCalories).reduce(0, Integer::sum));
System.out.println("Using Data Processing way ..."               + menu.stream().mapToInt(Dish::getCalories).sum());

Using Data Collecting way
 System.out.println("Using Data Collecting way ..."               + menu.stream().collect(summingInt(Dish::getCalories)));
 System.out.println("Using Data Collecting way with reduction..." + menu.stream().collect(reducing(0, Dish::getCalories, Integer::sum)));