Having Fun with Google Custom Search API and Java

Lets talk about the most awesome site of world, Google.  Google started as a search site and it supports over 100 billion searches a month. How does it do it? Here are some facts

  1. There are more than 60 Trillion individual Web Pages.
  2. Google navigates these pages by crawling.
  3. Once it crawls the pages it adds them to a massive index where the URL of page is stored with URL content as key.
  4. Now when you go and search on Google, it uses programs based on complicated algorithms to generate and produce search result.
  5. While ranking the URLs, Google considers more than 200 factors in consideration some of them are freshness, keyword density, site quality, relevance, synonyms etc.
  6. According to some estimate about 60% of internet traffic is non human and about 50% of this non human traffic is malicious. Also there are lots of malicious site which are intent on harming users. Google constantly works on blacklisting these pages and prevent spamming.
  7. For doing all these tasks, Google uses a total of 1 million servers approximately.

These are impressive stats.  Let us suppose that you want to have a Google like search feature on your site.  But instead of searching entire internet, you want to generate results only from a selected group of sites.

Lets say I want to create a search feature which indexes and searches only on Java Blogs. There are many tools and APIs available in different language that allows us to crawl, index and provide us the desired search result. In other words you can create your own search engine. But would your search engine be as available and robust as Google’s? Answer is,  perhaps, but it it will take lots of resources and time to do so. 

There is another approach and Google provides the alternative itself.

Google Custom Search API

Google Custom Search enables you to create a search engine for your website, your blog, or a collection of websites. You can configure your search engine to search both web pages and images. You can fine-tune the ranking, customize the look and feel of the search results, and invite your friends or trusted users to help you build your custom search engine. (Taken from google API tutorial doc).

In this post we will walk through how to create a custom Google search engine for a website and have some fun in turn. Without any further ado let us jump to the to-do steps.

As first step you need to actually create a search engine. There are two varieties

  1. Custom Google Search – This is free and can be accessed via https://www.google.com/cse/all
  2. Google Site Search – This is a more powerful and paid version of google search. The details can be found here. here http://www.google.com/enterprise/search/products/gss.html

For this tutorial we will use the free one and then we will try using api to create our own Custom search using it. Steps are below.

Step 1 – Create a search engine.

  1. For this visit the following link https://www.google.com/cse/manage/all
  2. Click on Add button. This will open create Custom Google Search Page
  3. First enter all the urls you wish to index and search against. You can choose as many as you want. I mostly choose java blogs like dzone, ibm, theserverside etc.
  4. Now enter the name of your search engine and save it.
  5. Now your search engine is created and you can go there and play with your searches.  Here is mine for your referral Weblog4j Search. Search for java and software related terms and you will get awesome results. You can customize the search page to some extent.

Steps 2 – Get the search engine id.

  1. Go to https://www.google.com/cse/manage/all.
  2. Click on the search engine you created.
  3. Go to Basic tab and find details. There is a button “Search Engine Id”. Click on the button and you will get the search engine id in the pop up. Save the id in a text file for later reference.

Step 3 – Getting the API key.

  1. Playing with json/atom custom search API requires an API key. 
  2. Go to Google Cloud Console.
  3. Create a new project and activate it. Now the project will be listed on console page. Click on the project created.
  4. On the page look at the left hand bar. Go to APIs and Auth -> API. This page list Google cloud APIs.  Find “Custom Search API” from the list and toggle it to ON.
  5. Now click on Credentials. You will find a Panel “Public API access”. Click on “Create New Key” button. On resultant popup click on Server Key.
  6. A new pop up opens. It will ask for IP which should be permitted to use the key. You can leave it empty and create the key.
  7. Copy and save the key at a secret location.

Google Custom Search API

So Now we have the search engine to play with, search engine id and API key and we are ready to get hands dirty with some code. API Overview.

  1. It is a REST api with a single method called list.
  2. The API method is GET.
  3. The response data is returned as a JSON or ATOM type.
  4. The response consists of 1. Actual search result 2. Metadata for search like number of  results, alternative search queries 3.Custom search engine metadata.
  5. The data model depends on OpenSearch 1.1 specification.

API URL – The rest url to invoke google custom search is 

https://www.googleapis.com/customsearch/v1?parameters

Parameters

  1. key – API key you saved in step 3 above
  2. cx – custom search engine id you got in step 2. In case of linked custom search engine use cref instead of cx
  3. q – the search engine query

For a complete reference of query parameter visit the following page https://developers.google.com/custom-search/json-api/v1/reference/cse/list.

Libraries and Dependencies

Since we are dealing with a third party API all we need is to make an ajax call from our front end and get resultant JSON response. But there are libraries in multiple languages available which makes the working with Google search API a breeze. For java we have following

google-api-java-client

You can download the search API library from here. Or you can use the following maven dependencies.

    <dependency>
      <groupId>com.google.apis</groupId>
      <artifactId>google-api-services-customsearch</artifactId>
      <version>v1-rev40-1.18.0-rc</version>
    </dependency>

    <dependency>
        <groupId>com.google.http-client</groupId>
        <artifactId>google-http-client-jackson</artifactId>
        <version>1.15.0-rc</version>
    </dependency>

Code for searching

We can use the normal rest APIs as there is nothing special about invoking Google custom search service url.  But for this tutorial we will be using the Google Client API for searching on our result set. Let check out the code.

package com.aranin.spring.googleapi.search;

import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.customsearch.Customsearch;
import com.google.api.services.customsearch.model.Result;
import com.google.api.services.customsearch.model.Search;

import java.util.List;

/**
 * Created by IntelliJ IDEA.
 * User: Niraj Singh
 * Date: 6/3/14
 * Time: 12:42 PM
 * To change this template use File | Settings | File Templates.
 */
public class GoogleSearchClient {

    final private String GOOGLE_SEARCH_URL = "https://www.googleapis.com/customsearch/v1?";

    //api key
    final private String API_KEY = "your api key from step 3";
    //custom search engine ID
    final private String SEARCH_ENGINE_ID = "your search engine id from step 2";

    final private String FINAL_URL= GOOGLE_SEARCH_URL + "key=" + API_KEY + "&cx=" + SEARCH_ENGINE_ID;

    public static void main(String[] args){

        GoogleSearchClient gsc = new GoogleSearchClient();
        String searchKeyWord = "weblog4j";
        List<Result> resultList =    gsc.getSearchResult(searchKeyWord);
        if(resultList != null && resultList.size() > 0){
               for(Result result: resultList){
                   System.out.println(result.getHtmlTitle());
                   System.out.println(result.getFormattedUrl());
                   //System.out.println(result.getHtmlSnippet());
                   System.out.println("----------------------------------------");
               }
        }
    }

    public List<Result> getSearchResult(String keyword){
         // Set up the HTTP transport and JSON factory
        HttpTransport httpTransport = new NetHttpTransport();
        JsonFactory jsonFactory = new JacksonFactory();
        //HttpRequestInitializer initializer = (HttpRequestInitializer)new CommonGoogleClientRequestInitializer(API_KEY);
        Customsearch customsearch = new Customsearch(httpTransport, jsonFactory,null);

        List<Result> resultList = null;
        try {
                Customsearch.Cse.List list = customsearch.cse().list(keyword);
                list.setKey(API_KEY);
                list.setCx(SEARCH_ENGINE_ID);
                //num results per page
                //list.setNum(2L);

                //for pagination
                list.setStart(10L);
                Search results = list.execute();
                resultList = results.getItems();

        }catch (Exception e) {
                e.printStackTrace();
        }

        return resultList;

    }
}

The code is very simple. We create a simple Customsearch object. From this object we get an Instance of CSE object which represents a search engine. Now we set our search engine properties like API Key, search engine id etc in the CSE. Then we go on to set the search criteria like search term, number of result, starting point (for pagination) for our resultset. Then execute it.

Finally we get a List of result objects. This contains all the search data and can be used to create our own search engine.

That is all folks. I hope you find this tutorial useful. Feel free to drop in couple of comments in case you like/dislike the post.

 

References

  1. https://www.google.co.in/insidesearch/howsearchworks/thestory/
  2. http://atkinsbookshelf.wordpress.com/tag/how-many-servers-does-google-have/
  3. http://www.google.co.in/about/datacenters/
  4. https://developers.google.com/custom-search/json-api/v1/overview
  5. https://developers.google.com/custom-search/docs/tutorial/creatingcse
  6. https://www.google.com/cse/manage/all
  7. https://cloud.google.com/console
  8. https://developers.google.com/custom-search/json-api/v1/introduction

 

Customsearch
Print Friendly

About Niraj Singh

I am CEO and CoFounder of a startup "Aranin Software Private Limited, Bangalore. I completed my graduation in 2002 as an Aerospace Engineer from IIT Kharagpur. I love working on new ideas and projects and recently released my first open source project JaiomServer "http://jaiomserver.org". I have 9 years of experience in IT industries most of which I have spent in developing community applications for various clients using java. Some of the sites in which I have actively involved with are hgtv.com, food.com, foodnetwork.com, pickle.com, diynetwork.com etc.
This entry was posted in General and tagged , . Bookmark the permalink.

10 Responses to Having Fun with Google Custom Search API and Java

  1. Felipe says:

    Olá,

    excelente tutorial! Como faço para retornar mais que 10 resultados?

    Obrigado!

    • Niraj Singh says:

      Hi Felipe,

      Thanks for the kind comment. To return more than 10 result you need to use CSE List.setNum() method. Please have a look at the getSearchResult() method. Uncomment the second line and if you want 25 results per page then use

      //num results per page
      list.setNum(25L);

      Please Note – Google puts a restriction of maximum 100 results per call. Hope this helps.

      Regards
      Niraj

  2. swathi says:

    I wanted to know how to select the first link in the set of links through a program?
    I am making this android app and i wanted to select the first link in the search result and go to that page directly. I don’t want a large set of links.

    • Niraj Singh says:

      Hi Swathi,

      You can set the number of links returned to 1.

      list.setNum(1L);

      Comment out the next line which is used for paginating i.e

      //list.setStart(10L);

      Now you have a list of one record which is the first record of the search. Just return resultList.get(0) instead of the list.

      Hope this helps

      Regards
      Niraj

  3. shruthi says:

    Hi,
    I’m new to all this and new to eclipse as well.
    I have a few questions and it will be much appreciated if you can help me out with them.
    Firstly I have done everything till the Libraries and Dependencies. I have no clue as to what to do next.
    Where do I add the dependencies ?
    I copy pasted the code and it gave me errors for the imports.
    I release it is because those dependencies aren’t there but I have no clue how to fix them.
    I have installed m2 for eclipse.
    Pardon my ignorance and please help me.
    Thank you.

  4. shruthi says:

    I even tried – “You can download the search API library from here” , but no use.
    There is no option to run as web application in the eclipse I have and localhost is giving me error.

    • Niraj Singh says:

      Hi Shruti,

      No problem. You need to create a stand alone java maven project in your eclipse. This will generate a pom.xml for you. You can add in the dependencies in this post in pom. You will know the place where to add once you have a look.

      Please follow the steps below.
      1. Try to create a simple java project using steps in http://weblog4j.com/2014/06/23/create-a-simple-maven-java-project-in-eclipse-using-m2eclispe/
      2. Open pom and add dependencies provided in this post in pom.xml.
      3. Add the java code in src/main/java package.
      4. Try running as “GoogleSearchClient ” as simple java application.

      Happy Learning
      Niraj

      • shruthi says:

        Hi,
        I did everything mentioned above , but still i am getting import errors.
        Can you please tell me how i can rectify it?
        Thank you

        • Niraj Singh says:

          Hi,

          It is difficult to say without look at your setting. Were you able to create the maven project successfully? Can you do following?

          1. Send your pom.xml
          2. Explode your project in the eclipse and send across a print screen..
          3. Right click on your project in project explorer -> Maven -> Enable Dependency Management.

          You can send these on my email ‘singh.niraj@aranin.com’

          Regards
          Niraj