jump to navigation

Data Mining on the Internet with Google June 15, 2006

Posted by cbeech in News - Google.
3 comments

Google has quickly become one of the most well known words in the world and is used by millions daily, including myself. In an advanced database class back in university, we spent a couple of weeks studying the inner workings of search engines, and one topic which happened to come up was data mining using Google. Much to my surprise, out of a class of 80 fourth year computer engineers maybe four or five knew how to use Google to perform any sort of advanced queries.

Google (and many other search engines) has the ability not only to search on keywords, but also using a more “database-ish” query language to really narrow down your search results. Below is a summary of a few of the most useful lesser known features. Note: in the examples, replace cwire.org with your own domain.

Basic Usage:

  • Use quotation marks ” “ to locate an entire string.
    eg. “bill gates conference” will only return results with that exact string.
  • Mark essential words with a +
    If a search term must contain certain words or phrases, mark it with a + symbol. eg: +”bill gates” conference will return all results containing “bill gates” but not necessarily those pertaining to a conference
  • Negate unwanted words with a –
    You may wish to search for the term bass, pertaining to the fish and be returned a list of music links as well. To narrow down your search a bit more, try: bass -music. This will return all results with “bass” and NOT “music”.

General Tips: (I use many of these almost on a daily basis)

  • site:www.cwire.org
    This will search only pages which reside on this domain.
  • related:www.cwire.org
    This will display all pages which Google finds to be related to your URL
  • link:www.cwire.org
    This will display a list of all pages which Google has found to be linking to your site. Useful to see how popular your site is
  • spell:word
    Runs a spell check on your word
  • define:word
    Returns the definition of the word
  • stocks: [symbol, symbol, etc]
    Returns stock information. eg. stock: msft
  • maps:
    A shortcut to Google Maps
  • phone: name_here
    Attempts to lookup the phone number for a given name
  • cache:
    If you include other words in the query, Google will highlight those words within the cached document. For instance, cache:www.cwire.org web will show the cached content with the word “web” highlighted.
  • info:
    The query [info:] will present some information that Google has about that web page. For instance, info:www.cwire.org will show information about the CyberWyre homepage. Note there can be no space between the “info:” and the web page url.
  • weather:
    Used to find the weather in a particular city. eg. weather: new york

Advanced Tips:

  • filetype:
    Does a search for a specific file type, or, if you put a minus sign (-) in front of it, it won’t list any results with that filetype. Try it with .mp3, .mpg or .avi if you like.
  • daterange:
    Is supported in Julian date format only. 2452384 is an example of a Julian date.
  • allinurl:
    If you start a query with [allinurl:], Google will restrict the results to those with all of the query words in the url. For instance, [allinurl: google search] will return only documents that have both “google” and “search” in the url.
  • inurl:
    If you include [inurl:] in your query, Google will restrict the results to documents containing that word in the url. For instance, [inurl:google search] will return documents that mention the word “google” in their url, and mention the word “search” anywhere in the document (url or no). Note there can be no space between the “inurl:” and the following word.
  • allintitle:
    If you start a query with [allintitle:], Google will restrict the results to those with all of the query words in the title. For instance, [allintitle: google search] will return only documents that have both “google” and “search” in the title.
  • intitle:
    If you include [intitle:] in your query, Google will restrict the results to documents containing that word in the title. For instance, [intitle:google search] will return documents that mention the word “google” in their title, and mention the word “search” anywhere in the document (title or no). Note there can be no space between the “intitle:” and the following word.
  • allinlinks:
    Searches only within links, not text or title.
  • allintext:
    Searches only within text of pages, but not in the links or page title.
  • bphonebook:
    If you start your query with bphonebook:, Google shows U.S. business white page listings for the query terms you specify. For example, [ bphonebook: google mountain view ] will show the phonebook listing for Google in Mountain View.
  • phonebook:
    If you start your query with phonebook:, Google shows all U.S. white page listings for the query terms you specify. For example, [ phonebook: Krispy Kreme Mountain View ] will show the phonebook listing of Krispy Kreme donut shops in Mountain View.
  • rphonebook:
    If you start your query with rphonebook:, Google shows U.S. residential white page listings for the query terms you specify. For example, [ rphonebook: John Doe New York ] will show the phonebook listings for John Doe in New York (city or state). Abbreviations like [ rphonebook: John Doe NY ] generally also work.

Putting it all Together: 

Now it’s time to start to get creative with our search terms and really narrow down our results. Now that we have the basics, let’s start to combine them all into one search term.

Example #1: Search for some MP3s
Let’s say you’re a Beatles fan and want to see if you can find some of their songs on the Internet without using Kazaa, etc. Try this query:

“index of” + “mp3″ + “beatles” -html -htm -php
or you could try this query:
* “index of/mp3″ -playlist -html -lyrics beatles

Right away on the first few results returned by Google you can download MP3s.

Example #2: Mixing some techniques together

Here’s a simple exercise. We’ll mix around a few terms to get more accurate results. Let’s say we want to research sleep recommendations. One assumption could be that research papers on this topic would most likely be on an educational website — perhaps with a .edu domain. We could try this query:

sleep recommendations site:edu

Maybe we’re in my situation, and am thinking of applying to grad school. Let’s see if we can find the Graduate Studies Admissions Requirements at the University of Toronto. We could try this query:

grad school admission requirements site:utoronto.ca

Summary:

After reading this article, you might be thinking “well, I could probably find those results without remembering these advanced search terms”. Well, the truth is that you probably could. The reason you want to start to use these advanced search tips is because they will help you find what you’re looking for faster. They greatly help narrow down the results, and more often than not, the information you were looking for will be in the first two or three results.

http://www.cwire.org/data-mining-using-google/

Advertisements

GBuy Launch June 28 June 15, 2006

Posted by cbeech in News - Google.
add a comment

Google is expected to unveil its long-anticipated online payments system later this month on June 28, according to an analysts report.

The service, dubbed “GBuy”, will process payments between shoppers and merchants. Eventually, it might also expanded to include consumer to consumer payments.

“GBuy has the potential to be as important to Google as Google Maps, or Google News, and there is very little that competitors can do to thwart its success,” says Jordan Rohan, RBC Analyst.

The product will launch in “beta” phase, and during this period Google will not charge merchants service fees. Further down the line, Google will most likely institute a 1.5% to 2% fee for transactions, which is similar to or slightly less than what rival Paypal charges.

On search result pages, Google will designate each merchant that accepts GBuy as a “Trusted GBuy Merchant”.

Google will benefit from the service not only monetarily, but also by their access to new information. The payments system captures all transaction data flow, allowing Google to see which categories and keywords produce the most hits and sales.

Analysts predict that GBuy could be “revolutionary”, driving more precise targeting in future searching. And although GBuy looks to be in direct competition with Paypal, Rohan believes that in the short term GBuy is more negative for eBay than it is positive for Google. Longer-term, it could be a game-changer.”

http://tech.v7n.com/2006/06/09/gbuy-launch-june-28/

Welcome to Google Checkout, that will be $3.14 June 15, 2006

Posted by cbeech in News - Google.
1 comment so far

The first time I looked up the domain “GDrive.com” it appeared that someone other than Google had it registered.  A trip down memory lane takes us to my very first article that describes how I determined GDrive.com is in fact owned by Google, despite what it looks like on the surface.

Well, by the same logic I have found that a brand new set of domains appearing to be registered to someone else were actually registered by Google on May 25th.

The domains googlecheckout.net/org/info (.com is owned by someone else at the moment) have all been registered to a company called DNStination, Inc.  Don’t be fooled, the registrar is MarkMonitor — a company that prides itself on the protection of your corporate identity.  There is no way they would let just anybody register a domain with “Google” in it — especially since Google is one of their clients.

Then who is this DNStination, Inc. then?  Googling the address of this “company” tells us exactly who it is.  The address maps directly to none other than MarkMonitor itself.

Since we know Google is behind it’s registration, what is Google Checkout going to be?  I think it will be a shopping cart system to help websites accept payment for their items online.  The money site owners make will be deposited into a holding account at Google — just like AdSense works.

Isn’t this starting to sound a lot like PayPal?  Who knows, they could even offer a Google branded Mastercard “debit card” like PayPal’s ATM/Debit Card — after all, the domain googlemastercard.com is registered to Google too.

If this is indeed what they are planning, it would make sense for Google Checkout to tie into Google Analytics so website owners can easily track with certainty how their AdWords campaign is directly affecting sales — right through the checkout process. 

Maybe one day Google will even provide an inventory management solution with an API so websites can have their inventory in Google Base and on their own website without double entry.

Google and Dell in software deal June 15, 2006

Posted by cbeech in News - Google.
add a comment

Computer giant Dell and internet search engine Google have reached a deal to install Google software on Dell’s PCs before they leave the factory.

The Dell computers will contain Google software including several personal computer applications, a Google toolbar and a co-branded homepage.

Both firms will receive revenue from the deal, but details remain unknown.

Google chief executive Eric Schmidt said that it was the first of several such deals.

Turning point

Speaking to a group of investors during a Goldman Sachs internet conference, Mr Schmidt said: “There is probably more to come.”

Dell made no comment.

The agreement between the world’s largest personal computer company and Google, comes after the two firms announced in February that they were in talks about installing Google’s software on Dell computers.

The talks came about after Yahoo pulled out of negotiations.

The deal could mark a major turning point for Google and mark a serious threat to rival Microsoft.

Microsoft and Google have adopted different business models.

Instead of selling software to make a profit, Google makes money by selling advertising to firms that want access to those who use its free products.

Microsoft has identified this sort of software as a key threat to its business, which relies on the healthy margins it earns from Windows and its Office productivity suite.

It is now evolving its own business more towards pay-per-use, seeking to integrate its offerings more with online applications.

Google shares rose by $1.74 closing at $382.99 on the Nasdaq, before slipping 99 cents in extended trading.

Meanwhile Dell shares climbed 12 cents closing at $24.30 on Nasdaq, before rising 9 cents in extended trading.

http://news.bbc.co.uk/1/hi/business/5018372.stm

Google’s Goal: A Worldwide Web of Books May 30, 2006

Posted by cbeech in News - Google, News - Tech.
add a comment

It’s odd to hear Vinton Cerf, regarded as one of the founding fathers of the Internet, to gush over ink-on-paper books.

The electronic pioneer and computer scientist, who now works as Google’s chief Internet evangelist, is also a bibliophile who has a collection of about 10,000 hard-copy volumes lining shelves at his home in McLean.

Google has vowed to create a full-text index of seven-million books in the University of Michigan library, along with millions more in the university libraries at Harvard, Stanford and Oxford, as well as the New York Public Library. The idea is similar to Amazon.com‘s “search inside the book” feature, eventually allowing anyone using Google’s free book search ( http://books.google.com/ ) not only to see sample pages from books but also search their contents and find excerpts matching search terms.

Google is not alone in trying to digitize library books. Yahoo, Microsoft and other Internet players have joined a collaborative effort called the Open Content Alliance, which is planning to digitize not only library books but other types of multimedia, as well, making them all accessible on the Web.

http://www.washingtonpost.com/wp-dyn/content/article/2006/05/17/AR2006051702016.html?nav=rss_technology

Google notebook May 29, 2006

Posted by cbeech in News - Google.
add a comment

Wow! Allows u to select text from anywhere on an HTML page and add to your online notebook. U can then categorize the entries in anyway u want. U are able to see your notebook via the Plug-In for FireFox, or in your browser. U are also able to make your notebook public.

http://www.google.com/notebook/

Yahoo!’s Most Popular Search Term? May 19, 2006

Posted by cbeech in News - Google.
add a comment

Why, it’s the same as MSN’s… "Google" of course.

I can’t imagine how frustrating it must be for Yahoo!’s engineers to endure search logs that are filled with "Google", "Google search", "search Google" and the occassional "myspace" or "Jessica Simpson"… I can see how that might get old very fast.

http://www.seomoz.org/blogdetail.php?ID=1073

The worse Google gets, the more money it makes? May 19, 2006

Posted by cbeech in News - Google.
add a comment

It's hard to imagine now, but there was a time when the mainstream press was barely acquainted with the genius and foresight of today's technology leaders.

Fifteen years ago Bill Gates appeared on the BBC's Wogan show – which the Beeb thought of as a nightly Johnny Carson, but which was really like watching Regis Philbin on cough syrup – to show off his WinPad PC. The wooden Gates made a joke about making his money disappear, with only a couple of clicks, using only a stylus. As Gates blinked, a nation which had never heard of Microsoft, and couldn't quite figure out why the guy in glasses wasn't singing or dancing, looked on in sympathetic embarrassment.

http://www.theregister.co.uk/2006/05/10/google_microsoft_redux/

The worse Google gets, the more money it makes? May 19, 2006

Posted by cbeech in News - Google.
add a comment

It’s hard to imagine now, but there was a time when the mainstream press was barely acquainted with the genius and foresight of today’s technology leaders.

Fifteen years ago Bill Gates appeared on the BBC’s Wogan show – which the Beeb thought of as a nightly Johnny Carson, but which was really like watching Regis Philbin on cough syrup – to show off his WinPad PC. The wooden Gates made a joke about making his money disappear, with only a couple of clicks, using only a stylus. As Gates blinked, a nation which had never heard of Microsoft, and couldn’t quite figure out why the guy in glasses wasn’t singing or dancing, looked on in sympathetic embarrassment.

http://www.theregister.co.uk/2006/05/10/google_microsoft_redux/