Skip to main content

Deep Web: Visualizing the Deep Web

The Deep Web Library Guide discusses why the Deep Web exists and what it contains. The Guide also provides tools for searching the Deep Web, along with resources for further information.

How Big Is the Deep Web?

Estimating the size of the Deep Web is a difficult task, given the sheer number of web pages involved, and the hidden nature of those pages. However, BrightPlanet, a "deep web intelligence company" notes this:

"...it is important to tap into the rich resources existing in the Deep Web. The last time an extensive study was completed estimating the size of the Deep Web was in 2001 — a time when the internet consisted of only approximately three million different domains. The 2001 study revealed that at that time the Deep Web was approximately 400-500 times the size of the Surface Web.

Today’s internet is significantly bigger with an estimated 555 million domains, each containing thousands or millions of unique web pages. As the web continues to grow, so too will the Deep Web and the value attained from Deep Web content."

Source: Pederson, Steve. "Understanding the Deep Web in 10 Minutes." BrightPlanet, Mar. 2013. Web. 18 Sept. 2013.

 

The Invisible Web

Components of Invisible Web - Opaque, Private, Proprietary and Truly Invisible Webs

Source: Devine, Jane, and Francine Egger-Sider. Going Beyond Google: The Invisible Web in Learning and Teaching. New York: Neal-Schuman, 2009. Print. Page 135.

The above diagram illustrates many of the concepts discussed in this guide.

Robots.txt is a file containing computer code which instructs crawlers how to crawl a given web page, or not to crawl it at all.

The Noindex Meta Tag is computer code appearing on a web page which prevents crawlers from indexing that page.

Relational databases are the kinds of databases already discussed in this guide: when searched, they provide results in dynamically generated pages.  Note that Google's web crawler now searches inside many databases and indexes the results.  These results are thus part of the Surface Web.

St. Louis Community College Libraries

Florissant Valley Campus Library
3400 Pershall Rd.
Ferguson, MO 63135-1408
Phone: 314-513-4514

Forest Park Campus Library
5600 Oakland
St. Louis, MO 63110-1316
Phone: 314-644-9210

Meramec Campus Library
11333 Big Bend Road
St. Louis, MO 63122-5720
Phone: 314-984-7797

Wildwood Campus Library
2645 Generations Drive
Wildwood, MO 63040-1168
Phone: 636-422-2000