jump to navigation

Data Mining on the Internet with Google June 15, 2006

Posted by cbeech in News - Google.
trackback

Google has quickly become one of the most well known words in the world and is used by millions daily, including myself. In an advanced database class back in university, we spent a couple of weeks studying the inner workings of search engines, and one topic which happened to come up was data mining using Google. Much to my surprise, out of a class of 80 fourth year computer engineers maybe four or five knew how to use Google to perform any sort of advanced queries.

Google (and many other search engines) has the ability not only to search on keywords, but also using a more “database-ish” query language to really narrow down your search results. Below is a summary of a few of the most useful lesser known features. Note: in the examples, replace cwire.org with your own domain.

Basic Usage:

  • Use quotation marks ” “ to locate an entire string.
    eg. “bill gates conference” will only return results with that exact string.
  • Mark essential words with a +
    If a search term must contain certain words or phrases, mark it with a + symbol. eg: +”bill gates” conference will return all results containing “bill gates” but not necessarily those pertaining to a conference
  • Negate unwanted words with a –
    You may wish to search for the term bass, pertaining to the fish and be returned a list of music links as well. To narrow down your search a bit more, try: bass -music. This will return all results with “bass” and NOT “music”.

General Tips: (I use many of these almost on a daily basis)

  • site:www.cwire.org
    This will search only pages which reside on this domain.
  • related:www.cwire.org
    This will display all pages which Google finds to be related to your URL
  • link:www.cwire.org
    This will display a list of all pages which Google has found to be linking to your site. Useful to see how popular your site is
  • spell:word
    Runs a spell check on your word
  • define:word
    Returns the definition of the word
  • stocks: [symbol, symbol, etc]
    Returns stock information. eg. stock: msft
  • maps:
    A shortcut to Google Maps
  • phone: name_here
    Attempts to lookup the phone number for a given name
  • cache:
    If you include other words in the query, Google will highlight those words within the cached document. For instance, cache:www.cwire.org web will show the cached content with the word “web” highlighted.
  • info:
    The query [info:] will present some information that Google has about that web page. For instance, info:www.cwire.org will show information about the CyberWyre homepage. Note there can be no space between the “info:” and the web page url.
  • weather:
    Used to find the weather in a particular city. eg. weather: new york

Advanced Tips:

  • filetype:
    Does a search for a specific file type, or, if you put a minus sign (-) in front of it, it won’t list any results with that filetype. Try it with .mp3, .mpg or .avi if you like.
  • daterange:
    Is supported in Julian date format only. 2452384 is an example of a Julian date.
  • allinurl:
    If you start a query with [allinurl:], Google will restrict the results to those with all of the query words in the url. For instance, [allinurl: google search] will return only documents that have both “google” and “search” in the url.
  • inurl:
    If you include [inurl:] in your query, Google will restrict the results to documents containing that word in the url. For instance, [inurl:google search] will return documents that mention the word “google” in their url, and mention the word “search” anywhere in the document (url or no). Note there can be no space between the “inurl:” and the following word.
  • allintitle:
    If you start a query with [allintitle:], Google will restrict the results to those with all of the query words in the title. For instance, [allintitle: google search] will return only documents that have both “google” and “search” in the title.
  • intitle:
    If you include [intitle:] in your query, Google will restrict the results to documents containing that word in the title. For instance, [intitle:google search] will return documents that mention the word “google” in their title, and mention the word “search” anywhere in the document (title or no). Note there can be no space between the “intitle:” and the following word.
  • allinlinks:
    Searches only within links, not text or title.
  • allintext:
    Searches only within text of pages, but not in the links or page title.
  • bphonebook:
    If you start your query with bphonebook:, Google shows U.S. business white page listings for the query terms you specify. For example, [ bphonebook: google mountain view ] will show the phonebook listing for Google in Mountain View.
  • phonebook:
    If you start your query with phonebook:, Google shows all U.S. white page listings for the query terms you specify. For example, [ phonebook: Krispy Kreme Mountain View ] will show the phonebook listing of Krispy Kreme donut shops in Mountain View.
  • rphonebook:
    If you start your query with rphonebook:, Google shows U.S. residential white page listings for the query terms you specify. For example, [ rphonebook: John Doe New York ] will show the phonebook listings for John Doe in New York (city or state). Abbreviations like [ rphonebook: John Doe NY ] generally also work.

Putting it all Together: 

Now it’s time to start to get creative with our search terms and really narrow down our results. Now that we have the basics, let’s start to combine them all into one search term.

Example #1: Search for some MP3s
Let’s say you’re a Beatles fan and want to see if you can find some of their songs on the Internet without using Kazaa, etc. Try this query:

“index of” + “mp3″ + “beatles” -html -htm -php
or you could try this query:
* “index of/mp3″ -playlist -html -lyrics beatles

Right away on the first few results returned by Google you can download MP3s.

Example #2: Mixing some techniques together

Here’s a simple exercise. We’ll mix around a few terms to get more accurate results. Let’s say we want to research sleep recommendations. One assumption could be that research papers on this topic would most likely be on an educational website — perhaps with a .edu domain. We could try this query:

sleep recommendations site:edu

Maybe we’re in my situation, and am thinking of applying to grad school. Let’s see if we can find the Graduate Studies Admissions Requirements at the University of Toronto. We could try this query:

grad school admission requirements site:utoronto.ca

Summary:

After reading this article, you might be thinking “well, I could probably find those results without remembering these advanced search terms”. Well, the truth is that you probably could. The reason you want to start to use these advanced search tips is because they will help you find what you’re looking for faster. They greatly help narrow down the results, and more often than not, the information you were looking for will be in the first two or three results.

http://www.cwire.org/data-mining-using-google/

Advertisements

Comments»

1. the police tour - June 14, 2007

Hello
Cool site!

2. Pam Sears - April 18, 2009

Thank you, Chris Beech. This is quite useful. I had stumbled across some of those search parameters when data mining. . . but knowing is more efficient than stumbling.

3. baljit - March 14, 2012

nice post, google search o keywords

webexpertsonline


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: