
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Piktochart Infographics &#187; Data Collection &amp; Research</title>
	<atom:link href="http://piktochart.com/category/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://piktochart.com</link>
	<description>Best Info graphic Design</description>
	<lastBuildDate>Tue, 21 May 2013 15:24:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Common Concepts to Look at When Comparing Multiple Data Visualization Tools  Common Concepts to Look at When Comparing Multiple Data Visualization Tools</title>
		<link>http://piktochart.com/2012/02/common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools-common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools/</link>
		<comments>http://piktochart.com/2012/02/common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools-common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 02:28:11 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=1743</guid>
		<description><![CDATA[<p>&#160; Ideally, data visualization is a balanced presentation of function and aesthetics with the goal of intuitively relaying a complex idea. It is this balance that often increases effectiveness because the scheme presented is neither too bland, in a bid to increase functionality, nor too flashy to appeal to the user. How well we achieve this balance is based on [...]</p><p>The post <a href="http://piktochart.com/2012/02/common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools-common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools/">Common Concepts to Look at When Comparing Multiple Data Visualization Tools  Common Concepts to Look at When Comparing Multiple Data Visualization Tools</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>Ideally, data visualization is a balanced presentation of function and aesthetics with the goal of intuitively relaying a complex idea. It is this balance that often increases <ins>effectiveness </ins>because the scheme presented is neither too <ins>bland, </ins>in a bid to increase functionality, nor too flashy to appeal to the user. How well we achieve this balance is based on a number of things, including our understanding of how the concept works, an appreciation for the almost limitless scope of data visualization, and the tool used for the actual visualization of the raw data we have.</p>
<p>&nbsp;</p>
<p>There are a number of data visualization tools available, and luckily not all of them will burn a hole in our pockets. The <ins>key </ins>is to find one that does not sacrifice quality for the price of &#8220;free,&#8221; so the best thing to do is compare them. This can be difficult since we can find in excess of 20 <ins>tools </ins>in just one internet <ins>search. </ins><ins>The </ins>first thing to do is decide what level we are, and what features we need at that level.</p>
<p>&nbsp;</p>
<p>Common things to consider:</p>
<p>&nbsp;</p>
<p>How interactive, <ins>customizable </ins>and user-friendly is the tool? For example, with its lack of basic<ins>instructions, </ins>Impure may not be the best tool for someone who is just learning, since there may be some fumbling around even for those who fall into its target audience. Google Fusion Tables is great for its variety of presentation methods, and is likely to appeal to those who rely on aesthetics, but it lacks the level of capacity, customization and functionality that some may <ins>need.</ins></p>
<p>&nbsp;</p>
<p>It <ins>is </ins><ins>also </ins><ins>important </ins><ins>to </ins><ins>consider </ins><ins>if </ins><ins>the </ins><ins>tool </ins><ins>can </ins>treat <ins>raw </ins>unpolished <ins>data. </ins>This may not be necessary for more advanced users who <ins>can </ins>sieve through and <ins>clean </ins><ins>up </ins>their own data before imputing it, but those who need help in this area have to be sure that the chosen tool works. Google Refine and DataWrangler are two <ins>examples </ins><ins>of </ins>tools <ins>that </ins>can <ins>sort </ins><ins>or </ins>clean up raw <ins>data.</ins>Other things to consider include command line/text only interfaces versus visual tool, web-based versus downloadable, any necessary additional features (such as <ins>email) </ins>and the overall limitations of the free version. The last factor is important since some will require a paid version in order to use some features that may be required, The R Project for Statistical Computing comes to mind if large data sets are being used.</p>
<p>&nbsp;</p>
<p>The post <a href="http://piktochart.com/2012/02/common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools-common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools/">Common Concepts to Look at When Comparing Multiple Data Visualization Tools  Common Concepts to Look at When Comparing Multiple Data Visualization Tools</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2012/02/common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools-common-concepts-to-look-at-when-comparing-multiple-data-visualization-tools/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The future of data visualization- what do you think will be at the forefront?</title>
		<link>http://piktochart.com/2012/02/the-future-of-data-visualization-what-do-you-think-will-be-at-the-forefront/</link>
		<comments>http://piktochart.com/2012/02/the-future-of-data-visualization-what-do-you-think-will-be-at-the-forefront/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 02:27:15 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=1741</guid>
		<description><![CDATA[<p>Data visualization to simplify or data visualization to overload? This picture has been taken from Smashing Magazine&#8217;s website. &#160; While data visualization existed in some form since antiquity, the world of computer science and social networking has created a greater need than ever for infographics. That being said, people may lack the software that is necessary [...]</p><p>The post <a href="http://piktochart.com/2012/02/the-future-of-data-visualization-what-do-you-think-will-be-at-the-forefront/">The future of data visualization- what do you think will be at the forefront?</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>Data visualization to simplify or data visualization to overload?</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2012/02/infosthetics02.jpg"><img class="aligncenter size-full wp-image-1753" title="infosthetics02 smashing magazine" src="http://piktochart.wpengine.com/wp-content/uploads/2012/02/infosthetics02.jpg" alt="infosthetics02 smashing magazine" width="470" height="365" /></a><br />
This picture has been taken from <a title="Smashing Magazine" href="http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/" target="_blank">Smashing Magazine&#8217;s website.</a></p>
<p>&nbsp;</p>
<p>While data visualization existed in some form since antiquity, the world of computer science and social networking has created a greater need than ever for infographics. That being said, people may lack the software that is necessary to make their own charts or tables. Office productivity software is sometimes able to make the graphics in question, but this usually requires users to familiarize themselves with a particular method of entering data. Likewise, it is easy to make a chart that is misleading. Misleading graphs could be mistaken as an attempt at dishonesty, and therefore, they are best avoided at all costs. Using a powerful set of tools does not need to be overly difficult, and one might find that it also is easy to convert their own digital data to infographics in-house.</p>
<p>&nbsp;</p>
<p>One of the current trends seems to be towards providing collections of graphs and charts for readers to peruse at their leisure. While most people might think that an infographic is an island in a sea of text, they have become such an easy way to organize information that they are now being appreciated in their own right. This means that they also need to be made more quickly than in the past to make up for the demand. In the future, data visualization will probably occur at a faster rate than most people can even imagine at the moment.</p>
<p>&nbsp;</p>
<p>Using dedicated do-it-yourself solutions constitute an excellent way to convert and manipulate infographics on a personal computer. Many people who read blogs expect to see images attached to their favorite websites. However, it oftentimes seems absurd to simply attach photographs without any reason. Perhaps with the trend towards collections of graphs that illustrate various news issues, more blogs will start to turn towards setting up their own collections. This is where DIY tools really get their chance to shine.</p>
<p>The post <a href="http://piktochart.com/2012/02/the-future-of-data-visualization-what-do-you-think-will-be-at-the-forefront/">The future of data visualization- what do you think will be at the forefront?</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2012/02/the-future-of-data-visualization-what-do-you-think-will-be-at-the-forefront/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Make The World a Better Place with Proper Presentations</title>
		<link>http://piktochart.com/2011/12/make-the-world-a-better-place-with-proper-presentations/</link>
		<comments>http://piktochart.com/2011/12/make-the-world-a-better-place-with-proper-presentations/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 09:05:12 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=1332</guid>
		<description><![CDATA[<p>One of the questions we often get at Piktochart: What if your users created really dull infographics, or worse, explode the entire infographic with colourful graphics and a terrible mismatch of colours/designs and lo &#38; behold, They Do Not Even Have Any Quality Data. That to us, ladies and gentlemen, is an example of a [...]</p><p>The post <a href="http://piktochart.com/2011/12/make-the-world-a-better-place-with-proper-presentations/">Make The World a Better Place with Proper Presentations</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>One of the questions we often get at Piktochart: What if your users created really dull infographics, or worse, explode the entire infographic with colourful graphics and a terrible mismatch of colours/designs and lo &amp; behold, They Do Not Even Have Any Quality Data.</p>
<p>That to us, ladies and gentlemen, is an example of a product failure. From a simple prototype which we have built, we have since re-iterated internally so many times, stalling development again and again. Once we thought it was Eureka, yet another user proved us wrong when he/she did not know how to use a particular button or did not know where to drag something into.</p>
<p>Preliminary look into our most current &#8220;prototype&#8221; layouts:</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/12/AppUI02a.jpg"><img class="alignleft size-full wp-image-1333" title="A New layout for Piktochart" src="http://piktochart.wpengine.com/wp-content/uploads/2011/12/AppUI02a.jpg" alt="A New layout for Piktochart" width="717" height="455" /></a></p>
<p>We arrived at this point after 5, no&#8230;. 10 iterations (roughly). We have almost lost count of the number of napkin sketches we have done. There is plenty more to come&#8230; But the point is:</p>
<p>&nbsp;</p>
<h2><del>You  </del>We Can Make The World A Better Place With Proper Data Stories</h2>
<p>Do you know what is the impact of well-thought-of data?</p>
<p>It provides the users with the following benefits:</p>
<ul>
<li>Takes the essence of the data immediately</li>
<li>Make their own judgement of what to think of that particular set of data- it starts conversations, it gets the brain running</li>
<li>Feel compelled to &#8220;react&#8221; to the data</li>
</ul>
<div>On the contrary, what happens when the story is not well thought through:</div>
<div>
<ul>
<li>Distracts them from the main point of the presentation</li>
<li>Little or no take-away/value to the user (to the point there is no point of reading it)</li>
<li>There might be some &#8220;Whats&#8221; answered, very little &#8220;How&#8221; or &#8220;Why&#8221; answered</li>
</ul>
<div>We are working to minimize errors and restrict control over colour/font type/ font size changes so that the user will not run crazy with the amount of customization available. In other words, yes, you will be able to customize colours etc, but not too much effort to allow customization of each and every icon. Apart from that, we have noted that the ability to generate a story out of the presentation is also mighty important, therefore we are thinking of ways to guide the users to do that. (For that, I will leave it to the next post &#8211; which contains the What, How, Why approach)</div>
<div>Do you agree, or disagree with our approach? Love to hear your thoughts below!</div>
</div>
<p>The post <a href="http://piktochart.com/2011/12/make-the-world-a-better-place-with-proper-presentations/">Make The World a Better Place with Proper Presentations</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/12/make-the-world-a-better-place-with-proper-presentations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Colours in Telling a story</title>
		<link>http://piktochart.com/2011/10/colours-in-telling-a-story/</link>
		<comments>http://piktochart.com/2011/10/colours-in-telling-a-story/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 14:20:16 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=884</guid>
		<description><![CDATA[<p>3 Ground Rules to Start with: &#160; This rule may appeal less to a pictographic piece of information. No more than 6 colors Use cultural conventions e.g. anger = danger Beware of bad interactions (red/blue etc.) There are 3 safe ways to pick your colour scheme in creating a data visualization. Similar colours (a.k.a. analogous [...]</p><p>The post <a href="http://piktochart.com/2011/10/colours-in-telling-a-story/">Colours in Telling a story</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p><span style="text-decoration: underline;"><strong>3 Ground Rules to Start with:</strong></span></p>
<p>&nbsp;</p>
<p><em><span style="font-size: small;"><span class="Apple-style-span" style="line-height: 24px;">This rule may appeal less to a pictographic piece of information.<br />
</span></span></em></p>
<ul>
<li>No more than 6 colors</li>
<li>Use cultural conventions e.g. anger = danger</li>
<li>Beware of bad interactions (red/blue etc.)</li>
</ul>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/rgb-color-wheel-lg.jpg"><img class="alignright size-large wp-image-885" title="rgb color harmony" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/rgb-color-wheel-lg-1022x1024.jpg" alt="rgb color harmony" width="720" height="721" /></a></p>
<p>There are 3 safe ways to pick your colour scheme in creating a data visualization.</p>
<p><strong>Similar colours (a.k.a. analogous colours)</strong></p>
<p>Analogous colours are any three colors which are side by side, e.g. Cyan, ocean and blue. For example, blue is the colour of the bar charts, cyan to highlight something important and ocean for everything else in between.</p>
<p><strong>Opposite colours (a.k.a. complementary colours)</strong></p>
<p>These are colours at two opposite ends, e.g. magenta and green (which can also be found in real life scenarios with flowers and leaf). These opposing colors create maximum contrast and maximum stability.</p>
<p>Always stick with social conventions and never use too much of a certain colour e.g. red for danger without very good reason. To demonstrate the smaller scale of things, a different hue paired with lightness is generally recommended as compared to changing colours entirely.</p>
<p>Happy colouring!</p>
<p>The post <a href="http://piktochart.com/2011/10/colours-in-telling-a-story/">Colours in Telling a story</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/10/colours-in-telling-a-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Variables in Visual Encoding</title>
		<link>http://piktochart.com/2011/10/variables-in-visual-encoding/</link>
		<comments>http://piktochart.com/2011/10/variables-in-visual-encoding/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 13:49:31 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=869</guid>
		<description><![CDATA[<p>Jock Mackinlay was born in Nuremberg, Germany and received his BA in Mathematics and Computer Science from UC Berkeley in 1975 and his PhD in computer science fromStanford University in 1986, where he pioneered the automatic design of graphical presentations of relational information. Wait, what has he got to do with visual encoding and how we see information? Mackinlay invented several methods of [...]</p><p>The post <a href="http://piktochart.com/2011/10/variables-in-visual-encoding/">Variables in Visual Encoding</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>Jock Mackinlay was born in <a title="Nuremberg" href="http://en.wikipedia.org/wiki/Nuremberg">Nuremberg</a>, <a title="Germany" href="http://en.wikipedia.org/wiki/Germany">Germany</a> and received his BA in Mathematics and Computer Science from <a title="UC Berkeley" href="http://en.wikipedia.org/wiki/UC_Berkeley">UC Berkeley</a> in 1975 and his PhD in <a title="Computer science" href="http://en.wikipedia.org/wiki/Computer_science">computer science</a> from<a title="Stanford University" href="http://en.wikipedia.org/wiki/Stanford_University">Stanford University</a> in 1986, where he pioneered the automatic design of graphical presentations of relational information.</p>
<p>Wait, what has he got to do with visual encoding and how we see information?</p>
<p>Mackinlay invented several methods of visualization and information graphics (otherwise known as infographics) and laid the foundation for Design Criteria, which he breaks into two:</p>
<p><strong>(1) Expressiveness</strong><br />
A set of facts is expressible in a visual language if the sentences<br />
(i.e. the visualizations) in the language express all the facts in<br />
the set of data, and only the facts in the data.<br />
<strong>(2) Effectiveness</strong><br />
A visualization is more effective than another visualization if the<br />
information conveyed by one visualization is more readily<br />
perceived than the information in the other visualization.</p>
<p>There are several things that allow visualizations within the sentences to express only the data. For example, clarifying the title, labels, legend, captions and not leaving it up to interpretation, would be a good method of committing the data set to expressiveness.</p>
<p>Separately, it has been advocated to avoid things which will not help with the visualizations, e.g.:</p>
<ul>
<li>Unexpressive marks (lines, bars, gradients)</li>
<li>Do not distract with faint gridlines, pastel highlights or other fills which do not explain the data</li>
<li>Describe the most important part of the data and keep everything to a minimal.</li>
</ul>
<p>Taken from the Stanford University presentation slides, they have displayed over 20 ways of visualizing the same data, effectiveness of multiple strains of antibiotics.</p>
<p>&nbsp;</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.38.08-PM.png"><img class="alignright size-full wp-image-871" title="data visualization" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.38.08-PM.png" alt="data visualization" width="696" height="518" /></a><br />
<a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.38.01-PM1.png"><img class="alignright size-full wp-image-874" title="data 2" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.38.01-PM1.png" alt="data visualization method 2" width="697" height="241" /></a><br />
<a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.52-PM1.png"><img class="alignright size-full wp-image-875" title="data visualization bar charts" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.52-PM1.png" alt="data visualization bar charts" width="692" height="519" /></a><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.43-PM.png"><img class="alignright size-full wp-image-876" title="bar chart " src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.43-PM.png" alt="bar chart " width="695" height="516" /></a><br />
<a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.14-PM.png"><img class="alignright size-full wp-image-878" title="line charts" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.14-PM.png" alt="dot charts" width="696" height="241" /></a><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.05-PM.png"><img class="alignright size-full wp-image-879" title="dot charts" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/Screen-shot-2011-10-02-at-9.37.05-PM.png" alt="dot and line charts" width="699" height="517" /></a></p>
<p>Is there any method that is clearer and appears more salient to you?</p>
<p>The post <a href="http://piktochart.com/2011/10/variables-in-visual-encoding/">Variables in Visual Encoding</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/10/variables-in-visual-encoding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Uni, Bi or Tri-Variate</title>
		<link>http://piktochart.com/2011/10/uni-bi-or-tri-variate/</link>
		<comments>http://piktochart.com/2011/10/uni-bi-or-tri-variate/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 13:21:58 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=866</guid>
		<description><![CDATA[<p>On Piktochart, the Chart Wizard helps you to select the method of data visualization which you could use for your data set. This is a little insight to how Wizard does that: Data is usually commonly shown in the 3 formats below: Univariate &#8211; x only (contains only one axis of information) Bivariate &#8211; x [...]</p><p>The post <a href="http://piktochart.com/2011/10/uni-bi-or-tri-variate/">Uni, Bi or Tri-Variate</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>On Piktochart, the Chart Wizard helps you to select the method of data visualization which you could use for your data set.</p>
<p>This is a little insight to how Wizard does that:</p>
<p>Data is usually commonly shown in the 3 formats below:</p>
<ul>
<li>Univariate &#8211; x only (contains only one axis of information)</li>
<li>Bivariate &#8211; x and y (contains two axis of information)</li>
<li>Trivariate &#8211; x, y, z (contains three axis of information)</li>
</ul>
<p>Wait a minute, still don&#8217;t get it? Attached below is an example that shows the difference between uni, bi and trivariate data formats.</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/10/data.png"><img class="alignright size-large wp-image-867" title="univariate bivariate trivariate data" src="http://piktochart.wpengine.com/wp-content/uploads/2011/10/data-1024x321.png" alt="univariate bivariate trivariate data" width="720" height="225" /></a></p>
<p>There are some data formats which are suitable for univariate, bivariate and trivariate. Among them:</p>
<p><span style="color: #000000;"><strong>Univariate</strong></span></p>
<ul>
<li>Line charts</li>
<li>Bar charts</li>
<li>Stacked bar charts</li>
<li>Scatter plots</li>
<li>Percentages</li>
<li>Candle chart</li>
</ul>
<p><strong>Bivariate</strong></p>
<ul>
<li>All of the above</li>
<li>Area chart</li>
<li>Histogram</li>
</ul>
<p><strong>Trivariate</strong></p>
<ul>
<li>All of the above (in univariate)</li>
<li>Bubble chart</li>
<li>Geo Map combined with one of the above</li>
<li>Tree Map</li>
</ul>
<p>We have not mentioned several in the list above for the methods of visualizations that we are planning to develop in the total of 20. Are any of the above your favourites?  I generally find more success with uni or bi, while trivariate requires quite a lot of &#8220;digestion&#8221; from the users. Which form of data do you usually display?</p>
<p>The post <a href="http://piktochart.com/2011/10/uni-bi-or-tri-variate/">Uni, Bi or Tri-Variate</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/10/uni-bi-or-tri-variate/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Visualize Data?</title>
		<link>http://piktochart.com/2011/10/why-visualize-data/</link>
		<comments>http://piktochart.com/2011/10/why-visualize-data/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 10:52:39 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=862</guid>
		<description><![CDATA[<p>We are living in the age where data is huge. We can now monitor everything and establish connections (or just correlations between events). With so much going on, we have to find the best way to communicate all these information. Data visualization the following: Conveys: Persuades people about something Collaborates: Presents an interesting/thought provoking argument [...]</p><p>The post <a href="http://piktochart.com/2011/10/why-visualize-data/">Why Visualize Data?</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>We are living in the age where data is huge. We can now monitor everything and establish connections (or just correlations between events). With so much going on, we have to find the best way to communicate all these information.</p>
<p>Data visualization the following:</p>
<ul>
<li><strong>Conveys</strong>: Persuades people about something</li>
<li><strong>Collaborates</strong>: Presents an interesting/thought provoking argument</li>
<li><strong>Reasoning</strong>: Makes decision</li>
</ul>
<p><em>&#8220;Data can change things if stories are told correctly.&#8221;  A quote by <a title="Jeff Veen" href="http://veen.com/jeff/archives/001000.html" target="_blank">Jeffrey Veen</a></em></p>
<p>Some examples where we can exercise this ability particularly well, with data visualization done right:</p>
<p><object width="425" height="344" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://swf.tubechop.com/tubechop.swf?vurl=NmiUsdn7qRk&amp;start=482.26&amp;end=579.77&amp;cid=211662" /><param name="allowfullscreen" value="true" /><embed width="425" height="344" type="application/x-shockwave-flash" src="http://swf.tubechop.com/tubechop.swf?vurl=NmiUsdn7qRk&amp;start=482.26&amp;end=579.77&amp;cid=211662" allowfullscreen="true" /></object></p>
<p>The above is a short presentation done by Jeffrey Venn who talked about how big data can be visualized to save the world (in a way). He takes us back to the 1800s to look at a cholera outbreak. While most visualizations mashed up the map and an airborne disease, John Snow took a closer look at the data and realized that the problem could have started from a water irrigation system. Very likely to stem out from a faulty water pump.</p>
<p>The interpretation of data must alwasy be left to your users. The more important thing is to ensure that your users are left with a tool which can help them interpret data.</p>
<p><em>&#8220;Let go of the control. Allow your readers to find their own story.&#8221;</em></p>
<p><em>This post was inspired by the slides at <a title="Stanford Edu" href="http://hci.stanford.edu/courses/cs448b/w09/lectures/20090107-ValueOfVisualization.pdf" target="_blank">Stanford Edu</a> and <a title="Veen Jeffrey" href="http://veen.com/jeff/archives/001000.html" target="_blank">Jeffrey Veen&#8217;s presentation</a>.</em></p>
<p>The post <a href="http://piktochart.com/2011/10/why-visualize-data/">Why Visualize Data?</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/10/why-visualize-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>6 Useful Databases to Dig for Data (and 100 more)</title>
		<link>http://piktochart.com/2011/08/6-useful-databases-to-dig-for-data/</link>
		<comments>http://piktochart.com/2011/08/6-useful-databases-to-dig-for-data/#comments</comments>
		<pubDate>Thu, 04 Aug 2011 08:00:10 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=149</guid>
		<description><![CDATA[<p>In your presentation, quoting from Wikipedia might not always be the best (unless they are linked to reputable papers and sources). Here are 6 major databases (some are crowd-sourced so they are by organisations and individuals) and we have about 100+ links which can be very useful. 1. Freebase Freebase is an open platform for [...]</p><p>The post <a href="http://piktochart.com/2011/08/6-useful-databases-to-dig-for-data/">6 Useful Databases to Dig for Data (and 100 more)</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>In your presentation, quoting from Wikipedia might not always be the best (unless they are linked to reputable papers and sources). Here are 6 major databases (some are crowd-sourced so they are by organisations and individuals) and we have about 100+ links which can be very useful.</p>
<p>1. <a title="FreeBase Data " href="www.freebase.com" target="_blank">Freebase</a></p>
<p>Freebase is an open platform for data-sharing and an automatic way of plotting data sets by timeline and map automatically. Useful feature with topics ranging from fictional characters to Modest Mouse.</p>
<p>2. <a title="UN Data " href="http://data.un.org" target="_blank">UN Data</a></p>
<p>Large data sets on virtually all the public data UN collects- you have to sign up to get access to the API but it will only take 1 minute for the access to be emailed into your account.</p>
<p>3.<a title="World Bank Data" href="http://data.worldbank.org/" target="_blank"> WorldBank</a></p>
<p>Where else would you go for data on financial standings of the any country&#8217;s economy, but the World Bank? Among some topics included, by country:</p>
<table>
<tbody>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/agriculture-and-rural-development">Agriculture &amp; Rural Development</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/infrastructure">Infrastructure</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/aid-effectiveness">Aid Effectiveness</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/labor-and-social-protection">Labor &amp; Social Protection</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/economic-policy-and-external-debt">Economic Policy and External Debt</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/poverty">Poverty</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/education">Education</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/private-sector">Private Sector</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/energy-and-mining">Energy &amp; Mining</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/public-sector">Public Sector</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/environment">Environment</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/science-and-technology">Science &amp; Technology</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/financial-sector">Financial Sector</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/social-development">Social Development</a></div>
</div>
</td>
</tr>
<tr>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/health">Health</a></div>
</div>
</td>
<td>
<div>
<div><a href="http://data.worldbank.org/topic/urban-development">Urban Development</a></div>
</div>
</td>
</tr>
</tbody>
</table>
<p>4. <a title="Data.gov" href="http://data.gov" target="_blank">Data.gov</a></p>
<p>Data.gov is leading the way in democratizing public sector data and driving innovation. This movement has spread throughout cities, states, and countries.  5 of 50+ categories:</p>
<ul>
<li><a href="http://explore.data.gov/catalog/raw/?category=Agriculture">Agriculture</a></li>
<li><a href="http://explore.data.gov/catalog/raw/?category=Arts%2C+Recreation%2C+and+Travel">Arts, Recreation, and Travel</a></li>
<li><a href="http://explore.data.gov/catalog/raw/?category=Banking%2C+Finance%2C+and+Insurance">Banking, Finance, and Insurance</a></li>
<li><a href="http://explore.data.gov/catalog/raw/?category=Births%2C+Deaths%2C+Marriages%2C+and+Divorces">Births, Deaths, Marriages, and Divorces</a></li>
<li><a href="http://explore.data.gov/catalog/raw/?category=Business">Business</a></li>
</ul>
<p>5. <a title="Infochimps" href="http://www.infochimps.com" target="_blank">Infochimps</a></p>
<div>
<p>Contains paid and free data sets just about anything. What is also cool is that it is not just about downloading datasets in csv etc, but also has an API that you can play with to extract data. Try Twitter as your search metric and you will see what I mean.</p>
<p>6. <a title="Needlebase" href="http://www.needlebase.com" target="_blank">NeedleBase</a><br />
Also acclaimed as the Wikipedia for data, this database meets its expectation. They have some data visualisations ready on a limited number of data sets which are available to the public. But the information is of quality.</p>
<p><strong>A random collection of data sets<br />
</strong></p>
<div>
<ul>
<li><a title="PirateBay Torrents" href="http://www.csg.uzh.ch/publications/data/piratebay.html" target="_blank">Torrent downloads and uploads on Pirate Bay</a></li>
<li><a title="Stanford SNAP" href="http://snap.stanford.edu/data/index.html" target="_blank">Nodes and graphs- a representation by Stanford Uni</a></li>
<li><a title="Political party time" href="http://politicalpartytime.org/data/all/" target="_blank">PartyTime for Political Party</a> tracking: contains records of invitations for fundraisers and other events feting lawmakers and congressional candidates</li>
</ul>
</div>
<ul>
<li><a title="We Feel Fine" href="http://www.wefeelfine.org/api.html" target="_blank">Human Emotions by We Feel Fine</a>: to allow other artists to more easily make pieces that explore these human emotions</li>
<li><a title="LittleSis lists" href="http://littlesis.org/lists" target="_blank">LittleSis profiles</a> who&#8217;s who in the biggest organisations in the world</li>
<li><a title="NY Times Bestseller API" href="http://developer.nytimes.com/docs/best_sellers_api" target="_blank">NY Times bestseller </a></li>
<li><a title="Trending Topics on Wikipedia" href="http://www.trendingtopics.org/" target="_blank">Trending Topics</a>: Trending Topics serves Hot Wikipedia Topics daily. It gets you the top hits on Wikipedia by search query. For example, on this day, REactive oxygen species seems to be on top of the search query.</li>
<li><a title="Google Flu Trends" href="http://www.google.org/flutrends/" target="_blank">Google Flu Trends</a></li>
<li>NY <a title="Times People" href="http://developer.nytimes.com/docs/timespeople_api/" target="_blank">Times People</a>: User data for <a href="http://nytimes.com/">nytimes.com</a>, including the user profiles, activities, news feeds, and networks.</li>
<li><a title="Crunchbase API" href="http://www.crunchbase.com/help/api" target="_blank">CrunchBase</a>: Plenty of information about startups and large tech companies</li>
</ul>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/08/directory22.jpg"><img class="alignnone size-full wp-image-153" title="Directory of Data sets" src="http://piktochart.wpengine.com/wp-content/uploads/2011/08/directory22.jpg" alt="Directory of Data sets" width="849" height="565" /></a></p>
<p><strong>Directory of crazy amounts of data sets and related stuff</strong></p>
<p>Datamob: <a href="http://datamob.org/datasets" target="_blank">http://datamob.org/datasets</a></p>
<p><strong>The really big list here is taken from <a title="DataWrangling" href="http://datawrangling.com" target="_blank">DataWrangling</a></strong></p>
<p>The list is not very updated as it was a 2009 update. A good place to start looking for data- although it might not be the most updated ones.</p>
<ul>
<li><a href="http://open.blogs.nytimes.com/2009/02/04/announcing-the-article-search-api/" rel="nofollow">Announcing the Article Search API &#8211; Open Blog &#8211; NYTimes.com</a><br />
<strong>tags</strong>: article, api, nytimes, text, corpus, newspaper</li>
<li><a href="http://apiwiki.twitter.com/REST-API-Documentation#SocialGraphMethods" rel="nofollow">Twitter API Wiki / REST API Documentation: Social Graph Methods</a><br />
<strong>tags</strong>: graph, network, api, social, twitter</li>
<li><a href="http://www.isi.edu/info-agents/RISE/repository.html" rel="nofollow">Information Extraction: The RISE Repository of Information Sources</a><br />
<strong>tags</strong>: information, textmining, extraction, reviews, jobs</li>
<li><a href="http://blog.build.kiva.org/2009/02/03/introducing-the-kiva-api/" rel="nofollow">build.kiva: Blog &#8211; Introducing the Kiva API</a><br />
<strong>tags</strong>: finance, api, social, kiva, microlending, lending</li>
<li><a href="http://users.on.net/~henry/home/wikipedia.htm" rel="nofollow">Using the Wikipedia link dataset &#8212; Henry Haselgrove</a><br />
<strong>tags</strong>: graph, network, link, wikipedia, pagerank</li>
<li><a href="http://developer.lookery.com/" rel="nofollow">Lookery Developer Network &#8211; Lookery Developer Resources</a><br />
<strong>tags</strong>: web, analytics, api, traffic, advertising, demographics, lookery</li>
<li><a href="http://projects.flowingdata.com/target/" rel="nofollow">Visualizing the Growth of Target, 1962-2008 | FlowingData</a><br />
<strong>tags</strong>: visualization, retail, finance, gis, map, location, store, via:magnetbox, target</li>
<li><a href="http://www.techcrunch.com/2009/01/30/the-economy-according-to-mint/" rel="nofollow">The Economy According To Mint</a><br />
<strong>tags</strong>: finance, commercial, consumer, mint, spending</li>
<li><a href="http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx" rel="nofollow">Repositories</a><br />
<strong>tags</strong>: links, textmining, books, rdf, ocr, documents</li>
<li><a href="http://subsidyscope.com/projects/bailout/tarp/" rel="nofollow">Subsidyscope.com</a><br />
<strong>tags</strong>: government, banking, csv, tarp, bailout</li>
<li><a href="http://remix.bestbuy.com/" rel="nofollow">Best Buy Remix &#8211; Welcome to the Best Buy Remix Developer Network</a><br />
<strong>tags</strong>: retail, data, api, product, bestbuy</li>
<li><a href="http://www.twibs.com/" rel="nofollow">twibs : find the businesses on twitter</a><br />
<strong>tags</strong>: directory, businesses, twitter, companies</li>
<li><a href="http://www.unearthedoutdoors.net/global_data/true_marble/download" rel="nofollow">True Marble Imagery &#8211; Free Download</a><br />
<strong>tags</strong>: gis, geo, map, mapping, images, satellite</li>
<li><a href="http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-friend-graph/" rel="nofollow">Massive Scrape of Twitter’s Friend Graph « blog.infochimps.org &#8211; Organizing Huge Information Sources</a><br />
<strong>tags</strong>: textmining, twitter, network, socialnetwork, pagerank, graph, queryminer</li>
<li><a href="http://groups.google.com/group/get-theinfo/browse_thread/thread/605a00d5ddc62d72" rel="nofollow">Twitter Scrape (rough draft) &#8211; get.theinfo | Google Groups</a><br />
<strong>tags</strong>: twitter, socialnetwork, graph</li>
<li><a href="http://www.backtype.com/developers" rel="nofollow">API Documentation — BackType</a><br />
<strong>tags</strong>: api, blog, comments, textmining, stream, trends, backtype, queryminer</li>
<li><a href="http://www.generatedata.com/#about" rel="nofollow">generatedata.com</a><br />
<strong>tags</strong>: random, generator, database, sql</li>
<li><a href="http://www.pymvpa.org/examples.html#exampledata" rel="nofollow">Full Examples — PyMVPA Home</a><br />
<strong>tags</strong>: fmri, neuroscience, python, neuralnetwork</li>
<li><a href="http://wiki.dbpedia.org/Downloads32" rel="nofollow">wiki.dbpedia.org : Downloads 32</a><br />
<strong>tags</strong>: wikipedia, named_entity, rdf, ontology</li>
<li><a href="http://www.physionet.org/physiobank/database/apnea-ecg/" rel="nofollow">CinC Challenge 2000 data sets</a><br />
<strong>tags</strong>: timeseries, machinelearning, ecg, health, medical, sleep, apnea</li>
<li><a href="http://www.daveyp.com/blog/archives/528" rel="nofollow">Free book usage data from the University of Huddersfield » &#8220;Self-plagiarism is style&#8221;</a><br />
<strong>tags</strong>: books, library, borrowing, recommender, isbn, recommendation, collaborative, filtering, opendata</li>
<li><a href="http://www.lib.berkeley.edu/PUBL/stats.html" rel="nofollow">UC Berkeley. Sheldon Margen Public Health Library. Statistical/Data Resources</a><br />
<strong>tags</strong>: health, links, resources, publichealth, berkeley</li>
<li><a href="http://www.icwsm.org/2009/data/" rel="nofollow">ICWSM 2009 &#8211; International AAAI Conference on Weblogs and Social Media</a><br />
<strong>tags</strong>: blog, crawl, corpus, network, web, link</li>
<li><a href="http://www.bart.gov/schedules/developers/index.aspx" rel="nofollow">BART &#8211; For Developers</a><br />
<strong>tags</strong>: urban, transportation, feeds, public, sanfrancisco, bart, api,</li>
<li><a href="http://www.cise.ufl.edu/research/sparse/matrices/" rel="nofollow">Tim Davis: UF Sparse Matrix Collection : sparse matrices from a wide range of applications</a><br />
<strong>tags</strong>: spare, matrix</li>
<li><a href="http://www.othersonline.com/" rel="nofollow">Others Online &#8211; Behavioral Targeting, Analytics and Advertising Service for Publishers, Ad Networks, Widgets, WiFi Networks</a><br />
<strong>tags</strong>: analytics, audience, segmentation, toolbar, commercial, sem, search, advertising</li>
<li><a href="http://www.bioid.com/downloads/facedb/index.php" rel="nofollow">HumanScan : BioID : Downloads : BioID Face Database</a><br />
<strong>tags</strong>: face, detection, image</li>
<li><a href="http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html#face-database" rel="nofollow">Face Detection</a><br />
<strong>tags</strong>: facerecognition, opencv, face, links,</li>
<li><a href="http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html" rel="nofollow">Building a (fast) Wikipedia offline reader</a><br />
<strong>tags</strong>: django, wikipedia, compressed, textmining, howto</li>
<li><a href="http://change.gov/page/content/discusshealthcare" rel="nofollow">Change.gov: The Obama-Biden Transition Team | Join the Discussion: Healthcare</a><br />
<strong>tags</strong>: textmining, opinion, comment, topic, government, queryminer</li>
<li><a href="http://www9.georgetown.edu/faculty/ev42/UNVoting.htm" rel="nofollow">UN General Assembly Voting Data</a><br />
<strong>tags</strong>: un, voting, statistics, government</li>
<li><a href="http://www.cs.nyu.edu/~ylclab/data/norb-v1.0-small/" rel="nofollow">NORB Object Recognition Dataset, Fu Jie Huang, Yann LeCun, New York University</a><br />
<strong>tags</strong>: image, 3d</li>
<li><a href="http://blog.programmableweb.com/2008/11/25/reddits-secret-api/" rel="nofollow">Reddit’s Secret API</a><br />
<strong>tags</strong>: reddit, api, json,</li>
<li><a href="http://www.idealware.org/blog/2008/10/mapping-blues-where-is-data.html" rel="nofollow">Idealware: Mapping Blues: Where is the Data?</a><br />
<strong>tags</strong>: resources, links</li>
<li><a href="http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html" rel="nofollow">Opinion Extraction, Opinion Mining, Sentiment Analysis, Summarization of Customer Reviews</a><br />
<strong>tags</strong>: sentiment, mining, classification, machinelearning, reviews, recommender, textmining, links</li>
<li><a href="http://www.datawrangling.com/amazon-web-services-public-datasets" rel="nofollow">Amazon Web Services Public Datasets » Data Wrangling Blog</a><br />
<strong>tags</strong>: amazon, ebs, ec2, s3, publicdata, hadoop</li>
<li><a href="http://aws.amazon.com/publicdatasets/" rel="nofollow">Amazon Web Services (AWS) Hosted Public Data Sets</a><br />
<strong>tags</strong>: amazon, ebs, publicdata</li>
<li><a href="http://www.aflcio.org/corporatewatch/paywatch/ceou/database.cfm" rel="nofollow">Executive PayWatch Database</a><br />
<strong>tags</strong>: ceo, compensation, pay, economics, business, labor</li>
<li><a href="http://www.yr-bcn.es/semanticWikipedia" rel="nofollow">http://www.yr-bcn.es/semanticWikipedia</a><br />
<strong>tags</strong>: wikipedia, named_entity, tagged, textming</li>
<li><a href="http://www.cid.harvard.edu/ciddata/ciddata.html" rel="nofollow">Research Datasets :: CID Data :: Center for International Development at Harvard University (CID)</a><br />
<strong>tags</strong>: economics, international, development,</li>
<li><a href="http://www.icpsr.umich.edu/NACDA/search.html" rel="nofollow">NACDA: Search Holdings</a><br />
<strong>tags</strong>: aging, statistics, studies</li>
<li><a href="http://images.google.com/hosted/life" rel="nofollow">LIFE photo archive hosted by Google</a><br />
<strong>tags</strong>: images, photo, pictures, search</li>
<li><a href="http://monkey.org/~jose/wiki/doku.php?id=phishingcorpus" rel="nofollow">phishingcorpus [JoseWiki]</a><br />
<strong>tags</strong>: phising, corpus, text, email, textmining, nlp, mail, security</li>
<li><a href="http://www.cloudera.com/hadoophack/datasets/wikipedia" rel="nofollow">Wikipedia Datasets for the Hadoop Hack | Cloudera</a><br />
<strong>tags</strong>: wikipedia, hadoop, textmining, links</li>
<li><a href="http://research.microsoft.com/users/nickcr/wscd09/" rel="nofollow">WSCD09: Workshop on Web Search Click Data 2009</a><br />
<strong>tags</strong>: workshop, search, web, microsoft, log,</li>
<li><a href="http://trec.nist.gov/data/qa/2001_qadata/main_task.html" rel="nofollow">Main Task QA Data</a><br />
<strong>tags</strong>: question, answering, trec, nlp, machinelearning</li>
<li><a href="http://www.alexandria.ucsb.edu/gazetteer/" rel="nofollow">ADL Gazetteer Development</a><br />
<strong>tags</strong>: named_entity, location, placenames, geo, nlp</li>
<li><a href="http://yooname.wordpress.com/2008/11/01/the-new-york-times-annotated-corpus/" rel="nofollow">The New York Times Annotated Corpus « YooName &#8211; named entity recognition</a><br />
<strong>tags</strong>: named_entity, nytimes, corpus, people, organizations, locations</li>
<li><a href="http://code.google.com/p/flossmole/wiki/downloading" rel="nofollow">downloading &#8211; flossmole &#8211; Google Code &#8211; How to get FLOSSmole data for your own use</a><br />
<strong>tags</strong>: opensource, project, activity, mysql, dump</li>
<li><a href="http://www.google.org/about/flutrends/how.html" rel="nofollow">Google Flu Trends | How does this work?</a><br />
<strong>tags</strong>: google, health, trends, search, prediction, epidemiology, biodefence, queries, queryminer</li>
<li><a href="http://www.seas.upenn.edu/~mdredze/datasets/sentiment/" rel="nofollow">Multi-Domain Sentiment Dataset</a><br />
<strong>tags</strong>: sentiment, review, product, amazon</li>
<li><a href="http://www.ruf.rice.edu/~pound/#scripts" rel="nofollow">Chris Pound&#8217;s Name Generation Page</a><br />
<strong>tags</strong>: bizzare, scifi, phrase, name, word, generators, random, perl</li>
<li><a href="http://www.tradingsolutions.com/resources/data.html" rel="nofollow">TradingSolutions &#8211; Data Sources</a><br />
<strong>tags</strong>: trading, finance, s, api, list</li>
<li><a href="http://open.blogs.nytimes.com/2008/10/14/announcing-the-new-york-times-campaign-finance-api/" rel="nofollow">Announcing the New York Times Campaign Finance API &#8211; Open &#8211; Code &#8211; New York Times Blog</a><br />
<strong>tags</strong>: nyt, api, campaign, donations, fec,</li>
<li><a href="http://commons.oreilly.com/wiki/index.php/Beautiful_Data" rel="nofollow">Beautiful Data &#8211; WikiContent</a><br />
<strong>tags</strong>: book, data, wiki, via:jhammerb</li>
<li><a href="http://www.pdsounds.org/" rel="nofollow">public domain sounds | free sound library</a><br />
<strong>tags</strong>: sound, publicdomain, audio</li>
<li><a href="http://developer.netflix.com/" rel="nofollow">Netflix API &#8211; Welcome to the Netflix Developer Network</a><br />
<strong>tags</strong>: netflix, api, movie, mashup, netflixprize, ratings</li>
<li><a href="http://data.octo.dc.gov/" rel="nofollow">Data Catalog</a><br />
<strong>tags</strong>: dc, government, feeds, transparency, opendata</li>
<li><a href="http://radar.oreilly.com/2008/09/open-beats-closed-best-buys-ne.html" rel="nofollow">Open beats Closed: Best Buy&amp;#8217;s new APIs &#8211; O&#8217;Reilly Radar</a><br />
<strong>tags</strong>: retail, bestbuy, api</li>
<li><a href="http://edgeofthewest.wordpress.com/2008/09/04/voter-registration-data/" rel="nofollow">Voter registration data; or, HERE IS YOUR HOPE, YOU FOOLS! « The Edge of the American West</a><br />
<strong>tags</strong>: voter, registration, politics, 2008</li>
<li><a href="http://www.tickermine.com/dig_deeper" rel="nofollow">Tickermine</a><br />
<strong>tags</strong>: custom, research, retail, finance, market, service, analyst,</li>
<li><a href="http://www.linkedmdb.org/" rel="nofollow">Linked Movie Data Base</a><br />
<strong>tags</strong>: rdf, movies, movie, api</li>
<li><a href="http://blog.programmableweb.com/2008/09/04/big-huge-thesaurus-api-access-145000-words-and-phrases/" rel="nofollow">Big Huge Thesaurus API: Access 145,000 Words and Phrases</a><br />
<strong>tags</strong>: webservice, api, thesaurus, textmining, nlp, rest,</li>
<li><a href="http://github.com/aaronsw/watchdog/tree/master/import/parse/fec.py" rel="nofollow">import/parse/fec.py at master from aaronsw&#8217;s watchdog — GitHub</a><br />
<strong>tags</strong>: fec, python, parser, government, campaign</li>
<li><a href="http://watchdog.jottit.com/volunteer?r=83" rel="nofollow">The Watchdog Project: volunteer</a><br />
<strong>tags</strong>: government, transparency, parsing, election, python</li>
<li><a href="http://blog.fortiusone.com/2008/07/09/dataset-of-the-day-where-are-the-obamacans/#comment-20326" rel="nofollow">Dataset of the day: Where are the Obamacans? | Off the Map &#8211; Official Blog of FortiusOne</a><br />
<strong>tags</strong>: obama, goverment, mashup, gis, geo, map, campaign, donations</li>
<li><a href="http://ihome.ust.hk/~derekhh/ActivityRecognition/index.html" rel="nofollow">Activity Recognition: Datasets, Bibliography and others</a><br />
<strong>tags</strong>: activity, recognition, intent</li>
<li><a href="http://www.cs.cmu.edu/~mmcgloho/fec/data/fec_data.html" rel="nofollow">Normalized Campaign Contribution Data</a><br />
<strong>tags</strong>: cmu, politics, campaign, donations, fec, via:jhammerb, government</li>
<li><a href="http://netsg.cs.sfu.ca/youtubedata/" rel="nofollow">YouTube Dataset</a><br />
<strong>tags</strong>: youtube, research, crawl, socialnetwork, network, graph, web</li>
<li><a href="http://crawdad.cs.dartmouth.edu/index.php" rel="nofollow">CRAWDAD</a><br />
<strong>tags</strong>: wireless, RF, radio, signal, dartmouth, network</li>
<li><a href="http://groups.google.com/group/twitter-development-talk/web/api-documentation" rel="nofollow">API Documentation &#8211; Twitter Development Talk | Google Groups</a><br />
<strong>tags</strong>: twitter, text, api</li>
<li><a href="http://ilps.science.uva.nl/resources/webfaq" rel="nofollow">Web FAQ collection | ILPS</a><br />
<strong>tags</strong>: faq, question_answering, questions, web, crawl, corpus, xml, textmining</li>
<li><a href="http://developer.yahoo.com/music/" rel="nofollow">Yahoo! Music API &#8211; YDN</a><br />
<strong>tags</strong>: api, yahoo, music, artists</li>
<li><a href="http://adwords.google.com/support/bin/answer.py?hl=en&amp;answer=68034" rel="nofollow">Search Query Performance report &#8211; Google AdWords Help Center</a><br />
<strong>tags</strong>: adwords, ppc, search, metrics, webanalytics, sem, query, queryminer</li>
<li><a href="http://www.wordze.com/" rel="nofollow">Wordze Keyword Research Tool</a><br />
<strong>tags</strong>: queryminer, keyword, tool, research, commercial, search, adwords</li>
<li><a href="http://www.idiap.ch/resources/frontalfaces/" rel="nofollow">Frontal Face Databases</a><br />
<strong>tags</strong>: facerecognition, face, image, recognition</li>
<li><a href="http://3stages.org/c/es2.cgi?search=listdata&amp;file=/data/data.html&amp;print=notitle&amp;header=/header/cat.header" rel="nofollow">Searchable Catalogs of Data</a><br />
<strong>tags</strong>: links, catalogs, social</li>
<li><a href="http://baseball1.com/content/view/57/82/" rel="nofollow">Download Database &#8211; baseball1.com</a><br />
<strong>tags</strong>: baseball, database, publicdata, statistics, sports</li>
<li><a href="http://code.google.com/p/radiohead/downloads/list" rel="nofollow">radiohead &#8211; Google Code</a><br />
<strong>tags</strong>: lidar, visualization, radiohead, google, video</li>
<li><a href="http://people.csail.mit.edu/torralba/tinyimages/" rel="nofollow">80 Million Tiny Images</a><br />
<strong>tags</strong>: images, words, english, search, visualization, imagemap</li>
<li><a href="http://timemachine.iic.harvard.edu/" rel="nofollow">Time Series Center | Harvard University</a><br />
<strong>tags</strong>: timeseries, anomaly, detection, astronomical, physics</li>
<li><a href="http://www.openvisuals.org/" rel="nofollow">OpenVisuals &#8211; Open Source Visualization Framework</a><br />
<strong>tags</strong>: visualization, community, design, processing</li>
<li><a href="http://geonames.usgs.gov/domestic/download_data.htm" rel="nofollow">BGN: Domestic Names &#8211; State and Topical Gazetteer Download Files</a><br />
<strong>tags</strong>: gis, usgs</li>
<li><a href="http://earth-info.nga.mil/gns/html/namefiles.htm" rel="nofollow">NGA: Country Files</a><br />
<strong>tags</strong>: country, cities, geo</li>
<li><a href="http://people.scs.fsu.edu/~burkardt/datasets/datasets.html" rel="nofollow">Datasets</a><br />
<strong>tags</strong>: benchmark, clustering, regression, machinelearning, list, statistics, mathematics</li>
<li><a href="http://isomap.stanford.edu/datasets.html" rel="nofollow">Isomap Datasets</a><br />
<strong>tags</strong>: nonlinear, dimensionality, reduction, faces, digits, images, manifold</li>
<li><a href="http://www.ysearchblog.com/archives/000599.html" rel="nofollow">Yahoo! Search Blog: BOSS &#8212; The Next Step in our Open Search Ecosystem</a><br />
<strong>tags</strong>: api, open, search, yahoo, BOSS, queryminer</li>
<li><a href="http://www.hostip.info/dl/index.html" rel="nofollow">Download the Database &#8211; IP Address Lookup &#8211; Community Geotarget IP Project</a><br />
<strong>tags</strong>: geocoding, geoip, internet, ip, ipaddress, mysql</li>
<li><a href="http://web.mit.edu/airlinedata/www/default.html" rel="nofollow">Airline Data Project</a><br />
<strong>tags</strong>: airline, statistics, finance, revenue, location, travel</li>
<li><a href="http://www.reddit.com/info/6q0oq/comments/" rel="nofollow">reddit.com: Ask Reddit: Where to download a DB dump of Reddit?</a><br />
<strong>tags</strong>: reddit, socialnetwork, news, web</li>
<li><a href="http://www.showusabetterway.co.uk/call/data.html#ons" rel="nofollow">Show Us a Better Way: What public data is already available?</a><br />
<strong>tags</strong>: statistics, census, uk, school, news, publicdata</li>
<li><a href="http://www.occamslab.com/petricek/data/" rel="nofollow">Collaborative filtering dataset &#8211; dating agency</a><br />
<strong>tags</strong>: collaborative, filtering, dating, rating, profiles, czech</li>
<li><a href="http://www.predictify.com/aboutus.aspx" rel="nofollow">About Us &#8211; Predictify</a><br />
<strong>tags</strong>: predictionmarket, tool, finance, buzz, advertising, marketing, startup, mmds, david_kellogg</li>
<li><a href="http://www.vgchartz.com/" rel="nofollow">VGChartz.com | Video Games, Charts, News, Forums, Reviews, Wii, PS3, Xbox360, DS, PSP</a><br />
<strong>tags</strong>: sales, ranking, videogames, retail</li>
<li><a href="http://www.npd.com/lps/corp_store_level/" rel="nofollow">Store Level Information</a><br />
<strong>tags</strong>: retail, finance, sales, store,</li>
<li><a href="http://graphics.cs.cmu.edu/projects/im2gps/flickr_code.html" rel="nofollow">Code for querying and downloading Flickr images</a><br />
<strong>tags</strong>: image, python, code, flickr, matlab, recognition</li>
<li><a href="http://www.imageparsing.com/FreeDataOutline.html#matlab" rel="nofollow">Image Parsing Datasets</a><br />
<strong>tags</strong>: image, recognition</li>
<li><a href="http://www.tagora-project.eu/data/#datasets" rel="nofollow">TAGora » Data</a><br />
<strong>tags</strong>: tag, tagging, s</li>
<li><a href="http://www.tagora-project.eu/data/#imdbnetflix" rel="nofollow">TAGora » Data</a><br />
<strong>tags</strong>: netflixprize, imdb, sparql</li>
<li><a href="http://www.fhwa.dot.gov/ohim/tvtw/tvtpage.htm" rel="nofollow">OHPI &#8211; Traffic Volume Trends</a><br />
<strong>tags</strong>: government, traffic, statistics, trends, transportation</li>
<li><a href="http://wiki.apache.org/pig/PigTutorial" rel="nofollow">PigTutorial &#8211; Pig Wiki</a><br />
<strong>tags</strong>: search, log, query, web, excite, queries, hadoop, pig, tutorial, mapreduce, parallel, queryminer</li>
<li><a href="http://kitchen.cs.cmu.edu/" rel="nofollow">Quality of Life Grand Challlenge Dataset: Kitchen Capture</a><br />
<strong>tags</strong>: machinelearning, motion, capture, sensor</li>
<li><a href="http://summize.com/api" rel="nofollow">Summize Twitter Search API</a><br />
<strong>tags</strong>: api, buzz, opinion, trends, text, twitter, summize, search</li>
<li><a href="http://www.merl.com/wmd/" rel="nofollow">2008 IEEE InfoVis Contest Dataset</a><br />
<strong>tags</strong>: visualization, contest, scalability, motion, tracking, pedestrian, sensor</li>
<li><a href="http://pro.imdb.com/title/tt0362120/boxoffice" rel="nofollow">IMDb Pro : Scary Movie 4: Box office</a><br />
<strong>tags</strong>: movie, revenue, sales, box_office, imdb, commercial, movie_study</li>
<li><a href="http://www.boxofficemojo.com/movies/?page=daily&amp;view=chart&amp;id=spiderman2.htm" rel="nofollow">Spider-Man 2 (2004) &#8211; Daily Box Office Results</a><br />
<strong>tags</strong>: movie, revenue, box_office,</li>
<li><a href="http://blogs.msdn.com/livesearch/archive/2008/04/24/xrank-celebrity-check-out-who-s-hot-and-who-s-not.aspx" rel="nofollow">Live Search : xRank™ Celebrity — check out who’s hot and who’s not!</a><br />
<strong>tags</strong>: search, query, volume, trends, celebrity, prediction, buzz, named_entity,</li>
<li><a href="https://secure.imdb.com/signup/v4/?d=imdbcompanyfilmo" rel="nofollow">IMDbPro.com Free Trial Signup</a><br />
<strong>tags</strong>: movie, revenue, timeseries, imdb, commercial, subsription</li>
<li><a href="http://www.economicswebinstitute.org/ecdata.htm" rel="nofollow">Free time-series and micro-data to download</a><br />
<strong>tags</strong>: economics, links</li>
<li><a href="http://www.juiceanalytics.com/openjuice/programmatic-google-trends-api/" rel="nofollow">PyGTrends: Python API for Google Trends Data</a><br />
<strong>tags</strong>: google, trends, search, web, analytics, api, code, python, hack, keyword, query, forecasting, indicator, finance</li>
<li><a href="http://googleblog.blogspot.com/2008/06/new-flavor-of-google-trends.html" rel="nofollow">Official Google Blog: A new flavor of Google Trends</a><br />
<strong>tags</strong>: google, trends, search, query, api, csv, keyword, timeseries</li>
<li><a href="http://blogs.sun.com/plamere/entry/open_research_the_data_lastfm" rel="nofollow">Open Research &#8211; the Data: Lastfm-ArtistTags2007 &#8211; Duke Listens!</a><br />
<strong>tags</strong>: last.fm, music, tagging, artists, tags, collaborative, filtering</li>
<li><a href="https://www.i2b2.org/NLP/" rel="nofollow">i2b2: Informatics for Integrating Biology &amp; the Bedside</a><br />
<strong>tags</strong>: medical, obesity,</li>
<li><a href="http://cs.nyu.edu/~yap/classes/modeling/01s/lect/l52/l.html" rel="nofollow">Tiger Data Set Lecture</a><br />
<strong>tags</strong>: tiger, gis, lectures</li>
<li><a href="http://www.techcrunch.com/2008/05/31/google-to-launch-large-scale-geo-services/" rel="nofollow">Google To Launch Large Scale Geo-Services</a><br />
<strong>tags</strong>: geo, google, gps, location, geolocation, cell, wifi, api, gis</li>
<li><a href="http://playground.last.fm/aliases/artist/Britney%20Spears" rel="nofollow">Last.fm’s Playground</a><br />
<strong>tags</strong>: celebrity, misspelling, spelling, names</li>
<li><a href="http://www.importgenius.com/" rel="nofollow">ImportGenius.com : U.S. Customs Database and Competitive Intelligence Tools</a><br />
<strong>tags</strong>: commercial, shipping, imports, exports, finance, datamining</li>
<li><a href="http://www.betfairpromo.com/betfairsp/prices/" rel="nofollow">Directory Listing of Betfair price files</a><br />
<strong>tags</strong>: betting, prediction, betfair, price, csv, predictionmarket</li>
<li><a href="http://spotlight.reuters.com/" rel="nofollow">Reuters Spotlight &#8211; Article and Media API</a><br />
<strong>tags</strong>: news, text, articles, api, content, media, xml, images, publicdata</li>
<li><a href="http://scipy.org/scipy/scikits/wiki/DataSets" rel="nofollow">DataSets &#8211; Scikits &#8211; Trac</a><br />
<strong>tags</strong>: scipy, python, machinelearning, statistics, resource</li>
<li><a href="http://lists.wikimedia.org/pipermail/wikitech-l/2007-December/035435.html" rel="nofollow">[Wikitech-l] page counters</a><br />
<strong>tags</strong>: wikipedia, pageviews, trends, textmining, seo, topic</li>
<li><a href="http://stats.grok.se/" rel="nofollow">Wikipedia article traffic statistics</a><br />
<strong>tags</strong>: via:chl, wikipedia, web, analytics, seo, topic, textmining, traffic</li>
<li><a href="http://developer.yahoo.com/geo/" rel="nofollow">Yahoo! Internet Location Platform &#8211; YDN</a><br />
<strong>tags</strong>: yahoo, geo, geocoding, location, landmarks, gis</li>
<li><a href="http://randomknowledge.wordpress.com/2008/05/09/how-to-find-images-on-the-internet/" rel="nofollow">How to find images on the internet « Random knowledge</a><br />
<strong>tags</strong>: images, links, lists, archive,</li>
<li><a href="http://www.news.com/8301-10784_3-9942397-7.html" rel="nofollow">Yahoo offers geographic data to Web sites | Tech news blog &#8211; CNET News.com</a><br />
<strong>tags</strong>: gis, webservice, yahoo, api, location, landmark</li>
<li><a href="http://ist.psu.edu/faculty_pages/jjansen/academic/transaction_logs.html" rel="nofollow">Instructions for Obtaining Search Engine Transaction Logs</a><br />
<strong>tags</strong>: query, search, log, excite, altavista, alltheweb, transaction</li>
<li><a href="http://techtc.cs.technion.ac.il/techtc.html#acquisition" rel="nofollow">TechTC &#8211; Technion Repository of Text Categorization Datasets</a><br />
<strong>tags</strong>: datamining, textmining, categorization, classification, odp, directory, text</li>
<li><a href="http://techtc.cs.technion.ac.il/techtc100/techtc100.html" rel="nofollow">The TechTC-100 Test Collection for Text Categorization</a><br />
<strong>tags</strong>: textmining, classification, category, odp, directory</li>
<li><a href="http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2007_2008" rel="nofollow">FEC Election Contributions: Download Detailed Files by Election Cycle</a><br />
<strong>tags</strong>: individual, donations, government, election, publicdata, fec</li>
<li><a href="http://www.juiceanalytics.com/openjuice/juiced-google-analytics-api/" rel="nofollow">Juiced Google Analytics Python API: Juice Analytics</a><br />
<strong>tags</strong>: search, statistics, keywords, analytics, api, python, web, seo, google, google_analytics, juice</li>
<li><a href="http://27.org/isocountrylist/" rel="nofollow">Country Name and ISO 3166 Code MySQL Import File</a><br />
<strong>tags</strong>: mysql, states, countries, isocode</li>
<li><a href="http://blog.programmableweb.com/2008/04/29/semantic-search-the-us-library-of-congress/" rel="nofollow">Semantic Search the US Library of Congress</a><br />
<strong>tags</strong>: via:inkdroid, libraries, mashup, rdf, semantic, search, semanticweb, books, api, webservice,</li>
<li><a href="http://geonames.wordpress.com/2007/05/11/geocoded-hotels/" rel="nofollow">geocoded Hotels « GeoNames Blog</a><br />
<strong>tags</strong>: hotels, geonames,</li>
<li><a href="http://www.geonames.org/export/" rel="nofollow">GeoNames webservice and data download</a><br />
<strong>tags</strong>: locations, cities, countries, gis</li>
<li><a href="http://www.maxmind.com/download/worldcities/" rel="nofollow">Index of /download/worldcities</a><br />
<strong>tags</strong>: cities, gis</li>
<li><a href="http://www.cs.ualberta.ca/~lindek/downloads.htm" rel="nofollow">ualberta dependency based thesaurus and word count data</a><br />
<strong>tags</strong>: corpus, text, similarity, terms</li>
<li><a href="http://www.commoncrawl.org/" rel="nofollow">CommonCrawl &#8211; About</a><br />
<strong>tags</strong>: web, crawler, bot,</li>
<li><a href="http://www.pdg.cnb.uam.es/martink/LINKS/bio_corpora_links.htm" rel="nofollow">Data sets and corpus / corpora for biological literature and text mining , information extraction and information retrival and document classification</a><br />
<strong>tags</strong>: bioinformatics, text, corpora, domainspecific, genomics, corpus,</li>
<li><a href="http://www-odi.nhtsa.dot.gov/downloads/index.cfm" rel="nofollow">Office of Defects Investigation (ODI), Flat File Downloads</a><br />
<strong>tags</strong>: defect, recall, automobile, fightclub, nhtsa, saefty</li>
<li><a href="http://pdos.csail.mit.edu/p2psim/kingdata/" rel="nofollow">p2psim &#8211; kingdata : DNS server latency network distance matrices</a><br />
<strong>tags</strong>: distance, matrix, network, p2p, dns, latency, nmf, queryminer</li>
<li><a href="http://www.kamvar.org/personalization/" rel="nofollow">Sep Kamvar / Personalization /</a><br />
<strong>tags</strong>: pagerank, web, matrix, matlab</li>
<li><a href="http://beta.opentick.com/" rel="nofollow">beta.opentick.com</a><br />
<strong>tags</strong>: opentick, trading, beta, feeds, finance</li>
<li><a href="http://wikixmldb.dyndns.org/" rel="nofollow">WikiXMLDB: Querying Wikipedia with XQuery</a><br />
<strong>tags</strong>: wikipedia, xml, ec2</li>
<li><a href="http://blog.kiwitobes.com/?p=51" rel="nofollow">kiwitobes.com » Blog Archive » Walmart Growth Video</a><br />
<strong>tags</strong>: walmart, visualization, video, freebase, store, retail, locations, opening</li>
<li><a href="http://www.opencellid.org/data/" rel="nofollow">Open Cell Id dataset &#8211; phone geolocation from GSM cellids</a><br />
<strong>tags</strong>: gis, mobile, geolocation</li>
<li><a href="http://www.weblab.infosci.cornell.edu/" rel="nofollow">The Cornell Web Lab &#8211; The Cornell Web Lab</a><br />
<strong>tags</strong>: cornell, web, archive, hadoop, crawl</li>
<li><a href="http://graphics.cs.cmu.edu/projects/im2gps/" rel="nofollow">im2gps: estimating geographic information from a single image</a><br />
<strong>tags</strong>: imagerecognition, via:csantos, gis, cmu, gps, imageprocessing, paper, hack, freaking_awesome</li>
<li><a href="http://muscle.prip.tuwien.ac.at/data_here.php" rel="nofollow">Datasets: MUSCLE WP2 Evaluation, Integration and Standards</a><br />
<strong>tags</strong>: image, video, audio, currency, sports, imagerecognition</li>
<li><a href="http://www.openeconomics.net/store/" rel="nofollow">Open Economics &#8211; Store &#8211; Index</a><br />
<strong>tags</strong>: economics, list</li>
<li><a href="http://www.omdb.org/movie" rel="nofollow">welcome @ omdb</a><br />
<strong>tags</strong>: free, movie, database, netflixprize</li>
<li><a href="http://www.cogmap.com/blog/2008/03/04/cogmap-apis/" rel="nofollow">Cogblog » Blog Archive » Cogmap APIs</a><br />
<strong>tags</strong>: api, cogmap, person, name, organization, record_linkage</li>
<li><a href="http://freebase.com/view/en/wal-mart" rel="nofollow">Wal-Mart : Freebase &#8211; The World&#8217;s Database</a><br />
<strong>tags</strong>: retail, locations, stores</li>
<li><a href="http://www.cogmap.com/index.php" rel="nofollow">Cogmap: The Org Chart Wiki</a><br />
<strong>tags</strong>: record_linkage, identity, name, organization, orgchart, marketing</li>
<li><a href="http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/de-news/" rel="nofollow">German English Parallel Corpus &#8220;de-news&#8221;, Daily News 1996-2000</a><br />
<strong>tags</strong>: german, translation, corpus, english, text, via:maxme</li>
<li><a href="http://www.crcns.org/" rel="nofollow">Welcome to the CRCNS data sharing activity website — CRCNS</a><br />
<strong>tags</strong>: neuroscience, patch, clamp, recordings, neuron, timeseries, patchclamp, data, neural, cortex, visual</li>
<li><a href="http://infochimps.org/" rel="nofollow">Infochimps.org: Free Redistributable Rich Data Sets</a><br />
<strong>tags</strong>: aggregator, links</li>
<li><a href="http://fimi.cs.helsinki.fi/data/" rel="nofollow">Frequent Itemset Mining Dataset Repository</a><br />
<strong>tags</strong>: retail, clickstream, traffic, web, links, sales</li>
<li><a href="http://blog.doloreslabs.com/?p=17" rel="nofollow">Dolores Labs Blog » Blog Archive » Our color names data set is online</a><br />
<strong>tags</strong>: colormap, color, mechanicalturk</li>
<li><a href="http://www.teradata.com/t/page/134891/index.html" rel="nofollow">TeradataUniversityNetwork.com -&gt; Registration</a><br />
<strong>tags</strong>: teradata, retail, transactional, database</li>
<li><a href="http://largescale.first.fraunhofer.de/instructions/" rel="nofollow">Pascal Learning Challenge Large Datasets</a><br />
<strong>tags</strong>: large, competition, challenge, svm, machinelearning, scalability</li>
<li><a href="http://www.ecis2007.ch/ws_tun.php" rel="nofollow">ECIS 2007 &#8211; The 15th European Conference on Information Systems</a><br />
<strong>tags</strong>: retail, dillards, sams_club</li>
<li><a href="http://docs.amazonwebservices.com/AlexaWebSearch/2007-03-15/" rel="nofollow">Alexa Web Search</a><br />
<strong>tags</strong>: alexa, aws, web, search, api,</li>
<li><a href="http://www.ibm.com/developerworks/podcast/dwi/cm-int092507txt.html" rel="nofollow">developerWorks Interviews: Massive data mining and the resurgent mainframe</a><br />
<strong>tags</strong>: price, retail, transaction, sams_club, dillards</li>
<li><a href="http://dailyheadlines.uark.edu/5374.htm" rel="nofollow">University of Arkansas &#8211; Daily Headlines</a><br />
<strong>tags</strong>: retail, dillards, uark</li>
<li><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/03/crime_data_bona.html" rel="nofollow">Crime data bonanza!!!</a><br />
<strong>tags</strong>: timeseries, crime, statistics, publicdata</li>
<li><a href="http://bulk.resource.org/courts.gov/0_README.html" rel="nofollow">State and Federal Case Law</a><br />
<strong>tags</strong>: creativecommons, court, legal, law, via:inkdroid</li>
<li><a href="http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines" rel="nofollow">Wikipedia:Lists of common misspellings/For machines &#8211; Wikipedia, the free encyclopedia</a><br />
<strong>tags</strong>: spelling, mispelling, wikipedia</li>
<li><a href="http://people.uwec.edu/koroghcm/public_domain.htm" rel="nofollow">Copyright Free and Public Domain Media</a><br />
<strong>tags</strong>: images, audio, publicdata, maps, video, free</li>
<li><a href="http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html" rel="nofollow">Access to Web Research Collections VLC2/WT10g/WT2g</a><br />
<strong>tags</strong>: blog, web, text</li>
<li><a href="http://peipa.essex.ac.uk/benchmark/databases/index.html" rel="nofollow">Databases you can use for benchmarking</a><br />
<strong>tags</strong>: image, vision, recognition,</li>
<li><a href="http://lyricsfly.com/api/" rel="nofollow">Lyricsfly Lyrics API, database access to search for music artist and song title, protocol REST with XML document</a><br />
<strong>tags</strong>: song, lyrics, database, api,</li>
<li><a href="http://www.elec.qmul.ac.uk/staffinfo/andrea/avss2007_d.html" rel="nofollow">2007 IEEE AVSS Detection and Tracking Algorithm Datasets</a><br />
<strong>tags</strong>: tracking, video, detection, image, recognition, vehicle, pedestrian,</li>
<li><a href="http://software.eigenvector.com/Data/" rel="nofollow">Eigenvector Research, Inc. : Data Sets Available to Download</a><br />
<strong>tags</strong>: NIR, spectra, chemistry, semiconductor, pharmaceutical, matlab,</li>
<li><a href="http://www.cse.ohio-state.edu/otcbvs-bench/" rel="nofollow">OTCBVS</a><br />
<strong>tags</strong>: image, recognition, detection, pedestrian, thermal, tracking, facerecognition, illumination</li>
<li><a href="http://www.mkbergman.com/?p=417" rel="nofollow">99 Wikipedia Sources Aiding the Semantic Web » AI3:::Adaptive Information</a><br />
<strong>tags</strong>: links, directory, record_linkage, extraction, wikipeida, named_entity, recognition, textmining, semanticweb, paper,</li>
<li><a href="http://data.un.org/Host.aspx?Content=About" rel="nofollow">UNdata</a><br />
<strong>tags</strong>: UN, publicdata, government, statistics</li>
<li><a href="http://www-etud.iro.umontreal.ca/~bergstrj/audioscrobbler_data.html" rel="nofollow">AudioScrobbler Data</a><br />
<strong>tags</strong>: audioscrobbler, recommendation, collaborative, filtering, music</li>
<li><a href="http://richard.cyganiak.de/2007/10/lod/" rel="nofollow">The Linking Open Data dataset cloud</a><br />
<strong>tags</strong>: directory, rdf, semantic, data, soup, graph</li>
<li><a href="http://www.economy.com/freelunch/" rel="nofollow">Free Economic Data | Economic, Financial, and Demographic Data</a><br />
<strong>tags</strong>: finance, economics, portal, links</li>
<li><a href="http://mlsp2008.conwiz.dk/index.php?id=43" rel="nofollow">::MLSP 2008::: MLSP competition</a><br />
<strong>tags</strong>: machinelearning, trading, competition, backtest, matlab, code, finance, via:DeliciousRob</li>
<li><a href="http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-images.html" rel="nofollow">Computer Vision Test Images</a><br />
<strong>tags</strong>: computer, vision, image, ray, trace, fingerprint, stereo, detection, via:chl</li>
<li><a href="http://thedata.org/" rel="nofollow">The Dataverse Network Project | The Dataverse Network Project</a><br />
<strong>tags</strong>: statistics, repository, harvard</li>
<li><a href="http://dvn.iq.harvard.edu/dvn/" rel="nofollow">DVN &#8211; Home</a><br />
<strong>tags</strong>: harvard, repository, social, science, research, portal, links</li>
<li><a href="http://www2.sos.state.oh.us/cf_ftp/voter_ftp_home?agree_flag=Y" rel="nofollow">Ohio voter registration data</a><br />
<strong>tags</strong>: voter, voting, politics, government, name, address, registration</li>
<li><a href="http://www.co.clark.nv.us/ELECTION/VoterData.asp" rel="nofollow">Voter List Data Files &#8211; Election Department, Clark County, Nevada</a><br />
<strong>tags</strong>: voting, voter, registration, name, address, data, election, politics, government, nevada</li>
<li><a href="http://www.cru.uea.ac.uk/cru/data/temperature/#datdow" rel="nofollow">Temperature data (HadCRUT3 and CRUTEM3)</a><br />
<strong>tags</strong>: climate, temperature, netcdf</li>
<li><a href="http://yann.lecun.com/exdb/mnist/" rel="nofollow">MNIST handwritten digit database, Yann LeCun and Corinna Cortes</a><br />
<strong>tags</strong>: handwriting, mnist, image, recognition</li>
<li><a href="http://vis-www.cs.umass.edu/lfw/" rel="nofollow">LFW : Labelled Faces in the Wild</a><br />
<strong>tags</strong>: facerecognition, face, recognition, umass, image</li>
<li><a href="http://www.37signals.com/svn/posts/311-making-random-contacts" rel="nofollow">Making random contacts &#8211; (37signals)</a><br />
<strong>tags</strong>: generator, names</li>
<li><a href="http://www.webresourcesdepot.com/test-sample-data-generators/" rel="nofollow">Test (Sample) Data Generators</a><br />
<strong>tags</strong>: generator, tools, list, via:jd</li>
<li><a href="http://developer.compete.com/" rel="nofollow">Compete &#8211; Compete Developer Resources</a><br />
<strong>tags</strong>: compete, api, web, statistics, traffic, analytics, mashup</li>
<li><a href="http://hunch.net/?p=170" rel="nofollow">Machine Learning (Theory) » The Peekaboom Dataset</a><br />
<strong>tags</strong>: peekaboom, vision, image, large, human, computation, machinelearning, recognition</li>
<li><a href="http://math.nyu.edu/~shafer/teaching/gcm07/" rel="nofollow">Ocean Processes and Modeling: Ocean Data</a><br />
<strong>tags</strong>: links, oceanography, satellite</li>
<li><a href="http://tangra.si.umich.edu/clair/blogocenter/dataset.html" rel="nofollow">BlogoCenter data sets</a><br />
<strong>tags</strong>: blog, ucla</li>
<li><a href="http://www.cs.technion.ac.il/~gabr/resources/data/ne_datasets.html" rel="nofollow">Tagged datasets for named entity recognition tasks</a><br />
<strong>tags</strong>: nlp, corpus, tagged, named_entity, recognition, list</li>
<li><a href="http://deli.ckoma.net/stats" rel="nofollow">del.icio.us stats &#8211; deli.ckoma</a><br />
<strong>tags</strong>: del.icio.us,</li>
<li><a href="http://fisher.osu.edu/fin/fdf/osudata.htm" rel="nofollow">The Financial Data Finder A &#8211; G</a><br />
<strong>tags</strong>: finance, links</li>
<li><a href="http://download.freebase.com/wex/" rel="nofollow">Freebase Wikipedia Extraction (WEX)</a><br />
<strong>tags</strong>: wikipedia, xml, structured, corpus</li>
<li><a href="http://export.arxiv.org/api_help/" rel="nofollow">The arXiv.org API</a><br />
<strong>tags</strong>: arxiv, api, open, paper, academic,</li>
<li><a href="http://www.football-data.co.uk/englandm.php" rel="nofollow">England Football Results Betting Odds | Premiership Results &amp; Betting Odds</a><br />
<strong>tags</strong>: gambling, soccer, football, excel, statistics</li>
<li><a href="http://hugheslab.ccbr.utoronto.ca/Main/HughesData" rel="nofollow">HughesData &#8211; Main &#8211; Hughes Lab</a><br />
<strong>tags</strong>: rna, bioinformatics, microarray, expression, gene, machinelearning</li>
<li><a href="http://genome-www5.stanford.edu/" rel="nofollow">Stanford MicroArray Database</a><br />
<strong>tags</strong>: bioinformatics, microarray, expression, gene, machinelearning, stanford</li>
<li><a href="http://www.ebi.ac.uk/microarray-as/aer/?#ae-main[0]" rel="nofollow">ArrayExpress Home</a><br />
<strong>tags</strong>: bioinformatics, microarray, expression, gene, machinelearning</li>
<li><a href="http://www.ncbi.nlm.nih.gov/geo/" rel="nofollow">Gene Expression Omnibus (GEO) Main page</a><br />
<strong>tags</strong>: bioinformatics, microarray, expression, gene, machinelearning</li>
<li><a href="http://bulk.resource.org/courts.gov/" rel="nofollow">Index of /courts.gov</a><br />
<strong>tags</strong>: corpus, text, legal, law, court, ruling, opensource, publicdata</li>
<li><a href="http://www.openvest.com/" rel="nofollow">Welcome to Openvest</a><br />
<strong>tags</strong>: python, finance, edgar, pylons, matplotlib, sec, webservice, via:jolby</li>
<li><a href="http://www.statsci.org/datasets.html" rel="nofollow">Statistical Science Web: Data Sets</a><br />
<strong>tags</strong>: links, statistics</li>
<li><a href="http://datamining.typepad.com/data_mining/2007/10/tailrank-spinn3.html" rel="nofollow">Data Mining: Text Mining, Visualization and Social Media: TailRank, Spinn3r, TechMeme and TechCrunch: New Attention</a><br />
<strong>tags</strong>: crawler, blog, corpus</li>
<li><a href="http://cobweb.ecn.purdue.edu/~aleix/ar.html" rel="nofollow">Aleix Face Database</a><br />
<strong>tags</strong>: facerecognition, machinelearning, face, image</li>
<li><a href="http://www.cs.umd.edu/class/spring2006/cmsc838s/data_repositories/repository_us.html" rel="nofollow">Data Repository Evaluation</a><br />
<strong>tags</strong>: umd, links, statistics, government, sports, via:rickladd</li>
<li><a href="http://www.pubmedcentral.nih.gov/about/ftp.html" rel="nofollow">PMC FTP Service</a><br />
<strong>tags</strong>: biology, medicine, articles, text, journal, authors</li>
<li><a href="http://labrosa.ee.columbia.edu/projects/musicsim/uspop2002.html" rel="nofollow">&#8220;uspop2002&#8243; data set</a><br />
<strong>tags</strong>: music, similarity, machinelearning</li>
<li><a href="http://www.archive.org/details/amazon_similarity_graph/" rel="nofollow">Internet Archive: Details: Amazon ASIN listing and similarity graph</a><br />
<strong>tags</strong>: ASIN, amazon, recommendation, collaborative, filtering, via:keyvowel</li>
<li><a href="http://eca.knmi.nl/dailydata/index.php" rel="nofollow">European Climate Assessment Daily Weather Data</a><br />
<strong>tags</strong>: weather, europe, ascii, netcdf</li>
<li><a href="http://sedac.ciesin.columbia.edu/povmap/ds_info.jsp" rel="nofollow">Poverty Data Sets General Information</a><br />
<strong>tags</strong>: poverty, statistics</li>
<li><a href="http://lib.stat.cmu.edu/datasets/" rel="nofollow">StatLib&#8212;Datasets Archive</a><br />
<strong>tags</strong>: machinelearning, datamining, cmu, link, collection</li>
<li><a href="http://nhts.ornl.gov/" rel="nofollow">National Household Travel Survey (NHTS) Data</a><br />
<strong>tags</strong>: driving, transportation, publicdata</li>
<li><a href="http://www.realclearpolitics.com/epolls/2008/president/us/democratic_presidential_nomination-191.html" rel="nofollow">RealClearPolitics &#8211; Election 2008 &#8211; Democratic Presidential Nomination</a><br />
<strong>tags</strong>: polls, politics</li>
<li><a href="http://www.bookscan.com/controller.php?page=109" rel="nofollow">Nielsen BookScan USA</a><br />
<strong>tags</strong>: books, sales, commercial</li>
<li><a href="http://www.pewinternet.org/data.asp" rel="nofollow">Pew Internet &amp; American Life Project</a><br />
<strong>tags</strong>: internet, demographics, online, web</li>
<li><a href="http://numbrary.com/" rel="nofollow">Home &#8211; Numbrary</a><br />
<strong>tags</strong>: finance, data,</li>
<li><a href="http://numbrary.com/about" rel="nofollow">About &#8211; Numbrary</a><br />
<strong>tags</strong>: searchengine, search, tagging, aggregator, numeric, extraction, tables, collaboration, web2.0, interface, billpoint</li>
<li><a href="http://www.opentextmining.org/wiki/Main_Page" rel="nofollow">Main Page &#8211; OpenTextMining</a><br />
<strong>tags</strong>: textmining, open, nature, standards, search</li>
<li><a href="http://stuff.metafilter.com/infodump/" rel="nofollow">Metafilter Infodump</a><br />
<strong>tags</strong>: metafilter, comments, network, via:chl</li>
<li><a href="http://www.yr-bcn.es/webspam/datasets/uk2007/" rel="nofollow">WEBSPAM-UK2007 | Datasets | Web Spam Detection</a><br />
<strong>tags</strong>: web, search, spam, crawler, yahoo</li>
<li><a href="http://blog.wired.com/wiredscience/2008/01/google-to-provi.html" rel="nofollow">Google to Host Terabytes of Open-Source Science Data | Wired Science from Wired.com</a><br />
<strong>tags</strong>: google, article, openaccess</li>
<li><a href="http://www.zillow.com/labs/NeighborhoodBoundaries.htm" rel="nofollow">Zillow &#8211; Labs &#8211; Neighborhood Boundaries</a><br />
<strong>tags</strong>: neighborhoods, geo, gis, maps</li>
<li><a href="http://www.trustlet.org/wiki/Datasets" rel="nofollow">Trust network datasets &#8211; TrustLet</a><br />
<strong>tags</strong>: socialnetwork, trustnetwork, trust</li>
<li><a href="http://www.fbi.gov/ucr/cius2006/index.html" rel="nofollow">Crime in the United States 2006</a><br />
<strong>tags</strong>: crime, fbi</li>
<li><a href="http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets" rel="nofollow">TaskForces/CommunityProjects/LinkingOpenData/DataSets &#8211; ESW Wiki</a><br />
<strong>tags</strong>: opendata, semantic, rdf, collaboration</li>
</ul>
<p>&nbsp;</p>
<p>Datasets listed in the original post on Jan 17, 2008:</p>
<ul>
<li><a href="http://www.datawrangling.com/some-datasets-available-on-the-web.html" rel="nofollow">Some Datasets Available on the Web » Data Wrangling Blog</a><br />
<strong>tags</strong>: publicdata, links</li>
<li><a href="http://www.xml.com/pub/a/2006/02/08/govtrack-us-public-data-semantic-web.html" rel="nofollow">XML.com: GovTrack.us, Public Data, and the Semantic Web</a><br />
<strong>tags</strong>: semanticweb, rdf, congress, politics, government</li>
<li><a href="http://www.citeulike.org/faq/data.adp" rel="nofollow">CiteULike: Available datasets</a><br />
<strong>tags</strong>: networks, research, graph, tags, paper, record_linkage</li>
<li><a href="http://www.archive-it.org/" rel="nofollow">Archive-It.org</a><br />
<strong>tags</strong>: archive, internet, web, index,</li>
<li><a href="http://www.causality.inf.ethz.ch/challenge.php" rel="nofollow">Challenge: Synopsis &#8211; Causality Workbench</a><br />
<strong>tags</strong>: competition, machinelearning, forecasting, contest</li>
<li><a href="http://research.microsoft.com/nlp/" rel="nofollow">Natural Language Processing</a><br />
<strong>tags</strong>: microsoft, text, paraphrase, corpus</li>
<li><a href="http://www.ldc.upenn.edu/Obtaining/" rel="nofollow">LDC &#8211; Linguistic Data Consortium &#8211; Obtaining Data Resorces</a><br />
<strong>tags</strong>: nlp, text, corpus, ngram, google, commercial, license</li>
<li><a href="http://www.census.gov/genealogy/names/names_files.html" rel="nofollow">1990 Census Name Files</a><br />
<strong>tags</strong>: census, names, identity, frequency, record_linkage</li>
<li><a href="http://www.galbithink.org/names/agnames.htm" rel="nofollow">Given Name Frequency Project: Analysis of Given Name Popularity</a><br />
<strong>tags</strong>: name, record_linkage, text, identity, code</li>
<li><a href="http://www.cs.cmu.edu/~einat/datasets.html" rel="nofollow">Email Datasets</a><br />
<strong>tags</strong>: enron, names, identity, text, record_linkage</li>
<li><a href="http://developer.zoominfo.com/" rel="nofollow">ZoomInfo &#8211; Welcome to the ZoomInfo Developer API</a><br />
<strong>tags</strong>: api, identity, people, webservice, record_linkage</li>
<li><a href="http://www.d.umn.edu/~tpederse/namedata.html" rel="nofollow">Ted Pedersen &#8211; Name Discrimination Data / Name Disambiguation Data / Name Ambiguity Data / Named Entity Resolution / Named Entity Disambiguation</a><br />
<strong>tags</strong>: record_linkage, corpus, nlp, names</li>
<li><a href="http://developer.dataunison.com/" rel="nofollow">Developers Area &#8211; eBay Market Data Documentation &#8211; eBay Market Data Documentation</a><br />
<strong>tags</strong>: ebay, api, retail, price, code</li>
<li><a href="http://ebiquity.umbc.edu/blogger/2007/08/10/new-swetodblp-dataset-released-with-11m-triples/" rel="nofollow">New SwetoDblp RDF dataset released with 11M triples</a><br />
<strong>tags</strong>: name, authorship, rdf, record_linkage</li>
<li><a href="http://lsdis.cs.uga.edu/projects/semdis/swetodblp/" rel="nofollow">LSDIS : SwetoDblp</a><br />
<strong>tags</strong>: bibliography, rdf, ontology, duplicate, name, record_linkage</li>
<li><a href="http://www.strikeiron.com/ProductDetail.aspx?p=257" rel="nofollow">StrikeIron Super Data Pack Web Service 1.0 &#8211; StrikeIron Marketplace</a><br />
<strong>tags</strong>: webservice, publicdata, datacleaning</li>
<li><a href="http://www.cdc.gov/vaccines/programs/iis/tech/dedup.htm" rel="nofollow">Vaccines: IIS/Tech/Deduplication Test Cases</a><br />
<strong>tags</strong>: duplicate</li>
<li><a href="http://www.cs.utexas.edu/users/ml/riddle/data.html" rel="nofollow">Duplicate Detection, Record Linkage, and Identity Uncertainty: Datasets</a><br />
<strong>tags</strong>: duplicate, detection, record_linkage, datacleaning, text</li>
<li><a href="http://instruct1.cit.cornell.edu/courses/info747/" rel="nofollow">INFO 747 &#8211; Social and Economic Data</a><br />
<strong>tags</strong>: datacleaning, record_linkage, video, lectures, course, cornell, economics, finance, publicdata</li>
<li><a href="http://www.overstock.com/aff_ftpdata.html" rel="nofollow">Overstock.com Affiliate Program</a><br />
<strong>tags</strong>: retail, overstock, sales, api, product, price, forecasting</li>
<li><a href="http://developer.amazonwebservices.com/connect/thread.jspa?threadID=18327&amp;tstart=0" rel="nofollow">Amazon Web Services Developer Connection : Can Alexa WS provide detailed &#8230;</a><br />
<strong>tags</strong>: finance, alexa, amazon, tech</li>
<li><a href="http://developer.ebay.com/programs/marketdata/" rel="nofollow">Market Data — eBay Developers Program</a><br />
<strong>tags</strong>: ebay, retail, pricing, sales, api, product</li>
<li><a href="http://phpartners.org/health_stats.html" rel="nofollow">Health Data Tools and Statistics</a><br />
<strong>tags</strong>: health, information, public, publicdata</li>
<li><a href="http://www.nytimes.com/2007/07/15/sports/baseball/15score.html?ex=1342238400&amp;en=638e19976bc5d04b&amp;ei=5124&amp;partner=permalink&amp;exprod=permalink" rel="nofollow">It’s a Pitch-by-Pitch Scouting Report, Minus the Scout &#8211; New York Times</a><br />
<strong>tags</strong>: baseball, gameday</li>
<li><a href="http://www.opentick.com/index.php?app=content&amp;event=market_data&amp;PHPSESSID=rm6bj4gurgb299lra7genn4la4" rel="nofollow">opentick :: market data</a><br />
<strong>tags</strong>: opentick, nasdaq, finance, stock</li>
<li><a href="http://www.dailykos.com/storyonly/2007/12/15/54158/397/946/422120" rel="nofollow">Daily Kos: Obama helps us track $1,000,000,000,000 of federal spending</a><br />
<strong>tags</strong>: corruption, government, politics, finance,</li>
<li><a href="http://www.usaspending.gov/index.php" rel="nofollow">Welcome to USAspending.gov</a><br />
<strong>tags</strong>: government, money, politics,</li>
<li><a href="http://www.fec.gov/disclosure.shtml" rel="nofollow">Campaign Finance Reports and Data</a><br />
<strong>tags</strong>: campaign, politics, elections</li>
<li><a href="http://cervisia.org/machine_learning_data.php" rel="nofollow">Machine Learning and Data Mining &#8211; Datasets</a><br />
<strong>tags</strong>: face, image</li>
<li><a href="http://www.esriuk.com/industries/schools_epidemic.asp?indid=34" rel="nofollow">GIS for Schools</a><br />
<strong>tags</strong>: epidemiology, gis, health</li>
<li><a href="http://www.cse.yorku.ca/~mridataset/" rel="nofollow">Cardiac MRI dataset &#8211; York University</a><br />
<strong>tags</strong>: mri, cardiac</li>
<li><a href="http://www.news.com/8301-10784_3-9828916-7.html" rel="nofollow">Google Trends API coming soon | Tech news blog &#8211; CNET News.com</a><br />
<strong>tags</strong>: google, trends, api,</li>
<li><a href="http://reality.media.mit.edu/download.php" rel="nofollow">MIT Media Lab: Reality Mining</a><br />
<strong>tags</strong>: social, activity, location, cell, gis</li>
<li><a href="http://rl-competition.org/component/option,com_frontpage/Itemid,1/" rel="nofollow">RL Competition 2008 &#8211; Home</a><br />
<strong>tags</strong>: machinelearning, reinforcement, agent, competition,</li>
<li><a href="http://branchandcut.org/VRP/data/" rel="nofollow">Vehicle Routing Data Sets</a><br />
<strong>tags</strong>: optimization, vehicle, routing</li>
<li><a href="http://www.eia.doe.gov/oil_gas/petroleum/info_glance/petroleum.html" rel="nofollow">EIA &#8211; Petroleum Data, Reports, Analysis, Surveys</a><br />
<strong>tags</strong>: oil, energy, statistics, economics, petroleum</li>
<li><a href="http://www.michael-noll.com/wiki/DMOZ100k06" rel="nofollow">DMOZ100k06 &#8211; Michael G. Noll</a><br />
<strong>tags</strong>: search, pagerank, text, tags, content</li>
<li><a href="http://www.cs.cmu.edu/~epxing/Class/10708-07/project.html" rel="nofollow">Grading</a><br />
<strong>tags</strong>: machinelearning, CMU, course, projects, graphicalmodel, code, paper</li>
<li><a href="http://mocap.cs.cmu.edu/search.php?maincat=3&amp;subcat=2" rel="nofollow">Carnegie Mellon University &#8211; CMU Graphics Lab &#8211; motion capture library</a><br />
<strong>tags</strong>: gait, pedestrian, walk, motion</li>
<li><a href="http://www.neatideas.com/data/data/EXCAUS.htm" rel="nofollow">Financial Forecast Center&#8217;s Historical Economic and Market Data</a><br />
<strong>tags</strong>: exchangerate, dollar, economics,</li>
<li><a href="http://www.bls.gov/data/home.htm" rel="nofollow">Bureau of Labor Statistics Data</a><br />
<strong>tags</strong>: economics, lumber, building, materials, homedepot</li>
<li><a href="http://www.economagic.com/bci_97.htm" rel="nofollow">Browse Business Cycle Indicators Data</a><br />
<strong>tags</strong>: economics, indicators, time, series</li>
<li><a href="http://blogs.wsj.com/numbersguy/aspiring-to-be-the-wikipedia-of-numbers-171/" rel="nofollow">The Numbers Guy : Aspiring to Be the Wikipedia of Numbers</a><br />
<strong>tags</strong>: finance, numberpedia, mechanicalturk, textmining, statistics</li>
<li><a href="http://bioinfo.uib.es/~joemiro/marvel.html" rel="nofollow">Social characteristics of the Marvel Universe</a><br />
<strong>tags</strong>: socialnetwork, graphs, comicbooks</li>
<li><a href="http://sourceforge.net/projects/wordlist/" rel="nofollow">SourceForge.net: Word Lists Collection</a><br />
<strong>tags</strong>: dictionary, words</li>
<li><a href="http://www.ers.usda.gov/Data/Macroeconomics/" rel="nofollow">ERS/USDA Data &#8211; International Macroeconomic Data Set</a><br />
<strong>tags</strong>: usda, economics, population, cpi, gdp, income</li>
<li><a href="http://wikis.ala.org/godort/index.php/State_Agency_Databases" rel="nofollow">State Agency Databases &#8211; GODORT</a><br />
<strong>tags</strong>: government, directory, links, wiki, states</li>
<li><a href="http://www.rdfabout.com/demo/census/" rel="nofollow">The 2000 U.S. Census: 1 Billion RDF Triples</a><br />
<strong>tags</strong>: gis, census, rdf, semantic, sparql</li>
<li><a href="http://www.wired.com/politics/onlinerights/news/2007/08/wiki_tracker" rel="nofollow">See Who&#8217;s Editing Wikipedia &#8211; Diebold, the CIA, a Campaign</a><br />
<strong>tags</strong>: wikipedia, authorship,</li>
<li><a href="http://www.datasetgenerator.com/" rel="nofollow">Dataset Generator &#8211; Perfect data for an imperfect world.</a><br />
<strong>tags</strong>: tools, generator</li>
<li><a href="http://www.nber.org/data/" rel="nofollow">National Bureasu of Economic Research: Data</a><br />
<strong>tags</strong>: economics, links</li>
<li><a href="http://kdd.ics.uci.edu/databases/entree/entree.html" rel="nofollow">Entree Chicago Recommendation Data</a><br />
<strong>tags</strong>: recommender, collaborative, restaurant</li>
<li><a href="http://www.unhp.org/crg/indy.html" rel="nofollow">community resource guide: i&#8217;ve been here before &#8211; show me the links</a><br />
<strong>tags</strong>: demographics, maps, gis, statistics, links</li>
<li><a href="http://3stages.org/c/es2.cgi?search=getdata&amp;file=/data/data.html&amp;print=notitle&amp;header=/header/data.header" rel="nofollow">Social Science Data on the Net</a><br />
<strong>tags</strong>: economics, social, government, health, labor, links</li>
<li><a href="http://www.fhwa.dot.gov/bridge/nbi/ascii.cfm" rel="nofollow">NBI ASCII Files &#8211; Bridge &#8211; FHWA</a><br />
<strong>tags</strong>: government, bridges, safety</li>
<li><a href="http://en.wikipedia.org/wiki/List_of_films:_A" rel="nofollow">List of films: A &#8211; Wikipedia, the free encyclopedia</a><br />
<strong>tags</strong>: netflix, netflixprize, movie, index, wikipedia,</li>
<li><a href="http://www.theory.physics.ubc.ca/arxiv/" rel="nofollow">The arXiv on your harddrive</a><br />
<strong>tags</strong>: paper, corpus, arXiv</li>
<li><a href="http://www.sunlightfoundation.com/resources" rel="nofollow">Insanely Useful Websites | Sunlight Foundation</a><br />
<strong>tags</strong>: links, transparency, government, politics, congress, reference</li>
<li><a href="http://lifehacker.com/software/technophilia/where-to-find-public-records-online-280785.php" rel="nofollow">Technophilia: Where to find public records online &#8211; Lifehacker</a><br />
<strong>tags</strong>: public, records, links</li>
<li><a href="http://clg.wlv.ac.uk/projects/junk-email/" rel="nofollow">Junk email project</a><br />
<strong>tags</strong>: corpus, email, spam, textmining</li>
<li><a href="http://www.cs.cmu.edu/~enron/" rel="nofollow">Enron Email Dataset</a><br />
<strong>tags</strong>: enron, corpus, email, text, social, network</li>
<li><a href="ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt" rel="nofollow">ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt</a><br />
<strong>tags</strong>: finance, cpi, inflation, data</li>
<li><a href="http://gos2.geodata.gov/wps/portal/gos/kcxml/04_Sj9SPykssy0xPLMnMz0vM0Y_QjzKL9443cnIFSYGYfpb6kehCFhhCYaEQobAwuIg3VMRX39cjPzdV31s_QL8gNzQ0NKLcEQCPBIjt/delta/base64xml/L3dJdyEvUUd3QndNQSEvNElVRS82X0tfNEFB" rel="nofollow">GOS &#8211; Geospatial One Stop</a><br />
<strong>tags</strong>: health, gis, epidemiology, links</li>
<li><a href="http://douweosinga.com/projects/ciagrep" rel="nofollow">CIA Factbook Grep in Python</a><br />
<strong>tags</strong>: cia, population, python, code, grep</li>
<li><a href="http://millercenter.virginia.edu/scripps/digitalarchive/presidentialrecordings/nixon/oval?PHPSESSID=b813e56b3017d097cd176720bc10fc74" rel="nofollow">Miller Center of Public Affairs &#8211; Richard Nixon &#8211; Oval Office Recordings</a><br />
<strong>tags</strong>: nixon, speech, tapes, audio, mp3, wav, flac</li>
<li><a href="http://www.deborahjeanepalfrey.com/Jeane10c.html" rel="nofollow">Deborah Jeane Palfrey Legal Defense Fund</a><br />
<strong>tags</strong>: phone, politics</li>
<li><a href="http://mill.ucsd.edu/index.php?page=Datasets&amp;subpage=Overview" rel="nofollow">UC San Diego Data Mining Competition &#8211; 2007 &#8211; Datasets</a><br />
<strong>tags</strong>: housing, refinance, mortgage,</li>
<li><a href="http://project.knowledgeforge.net/ckan/wiki/package" rel="nofollow">package &#8211; MoinMaster</a><br />
<strong>tags</strong>:</li>
<li><a href="http://www.bizstats.com/spf2changes.htm" rel="nofollow">Retail Industry Financial Ratios &amp; Benchmarks</a><br />
<strong>tags</strong>: retail, finance, sales, sqft,</li>
<li><a href="http://www.bizstats.com/spf1.htm" rel="nofollow">Retail Industry Financial Ratios &amp; Benchmarks</a><br />
<strong>tags</strong>: retail, finance, sales, sqft</li>
<li><a href="http://www.poi-factory.com/taxonomy/term/6" rel="nofollow">stores | POI Factory</a><br />
<strong>tags</strong>: retail, location, poi</li>
<li><a href="http://www.gpspassion.com/forumsen/topic.asp?TOPIC_ID=56474" rel="nofollow">GpsPasSion Forums &#8211; ** INDEX OF POI COLLECTIONS **</a><br />
<strong>tags</strong>: retail, poi, location, gis, gps</li>
<li><a href="https://shop.gps-poi-us.com/categoryNavigationDocument.hg?categoryId=3" rel="nofollow">GPS POI US : Home &gt; Retail Stores</a><br />
<strong>tags</strong>: retail, location, gis</li>
<li><a href="http://cdg.columbia.edu/cdg/datasets" rel="nofollow">Collective Dynamics Group</a><br />
<strong>tags</strong>: smallworld, networking, socialnetwork, graph</li>
<li><a href="http://www.ieor.berkeley.edu/~goldberg/jester-data/" rel="nofollow">Jester Data download page</a><br />
<strong>tags</strong>: collaborative, filtering, jokes</li>
<li><a href="http://www.multitel.be/trictrac/?mod=3" rel="nofollow">TricTrac: Video Dataset</a><br />
<strong>tags</strong>: video,</li>
<li><a href="http://www.alacrawiki.com/index.php?title=Premium_Business_Information_Databases" rel="nofollow">Premium Business Information Databases &#8211; AlacraWiki</a><br />
<strong>tags</strong>: links, finance, commercial</li>
<li><a href="http://bulk.resource.org/edgar/" rel="nofollow">Index of /edgar</a><br />
<strong>tags</strong>: finance, xml, edgar, sec, code, perl</li>
<li><a href="http://park.org/Cdrom/TheNot/Mail/index.html" rel="nofollow">Mail Index</a><br />
<strong>tags</strong>: EDGAR, sec, mail, text</li>
<li><a href="http://metafy.pbwiki.com/AnthraciteIdioms" rel="nofollow">metafy / AnthraciteIdioms</a><br />
<strong>tags</strong>: finance, SEC, scrape, parse, commercial</li>
<li><a href="http://www.census.gov/svsd/www/adseriesold.html" rel="nofollow">Advance Monthly Sales for Retail and Food Services &#8211; Time Series Data/Seasonal Factors &#8211; 1992 to Present</a><br />
<strong>tags</strong>: retail, sales, census</li>
<li><a href="http://www.nist.gov/speech/tests/tdt/" rel="nofollow">TDT</a><br />
<strong>tags</strong>: categorization, textmining, detection, tools</li>
<li><a href="http://www.statistics.gov.uk/StatBase/ssdataset.asp?vlnk=6316&amp;More=Y" rel="nofollow">Volume of retail sales: Social Trends 33</a><br />
<strong>tags</strong>: retail, sales, uk</li>
<li><a href="http://www.generatedata.com/#generator" rel="nofollow">generatedata.com</a><br />
<strong>tags</strong>: tools, generator, random</li>
<li><a href="http://www.columbia.edu/cu/lweb/indiv/business/guides/filings.print.html" rel="nofollow">U.S. Company Filings and Annual Reports</a><br />
<strong>tags</strong>: finance, links, sec</li>
<li><a href="http://www.sec.gov/edgar/searchedgar/ftpusers.htm" rel="nofollow">FTP Information &#8211; EDGAR Database</a><br />
<strong>tags</strong>: edgar, finance, sec, filing, ftp, instructions</li>
<li><a href="http://www.investopedia.com/articles/basics/03/053003.asp" rel="nofollow">Data Mining For Investing</a><br />
<strong>tags</strong>: investing, finance, datamining, announcement, sec, filing, links</li>
<li><a href="http://www.melissadata.com/lookups/index.htm" rel="nofollow">Melissa DATA &#8211; Lookups</a><br />
<strong>tags</strong>: consumer, data, database, api</li>
<li><a href="http://www.kiplinger.com/columns/picks/archive/2006/pick0825.htm" rel="nofollow">FactSet: Data Maven &#8211; Kiplinger.com</a><br />
<strong>tags</strong>: factset, finance,</li>
<li><a href="http://wrds.wharton.upenn.edu/demo/ibes/index.shtml" rel="nofollow">IBES (Demo)</a><br />
<strong>tags</strong>: finance, ibes, analyst, forecast, wharton</li>
<li><a href="http://www.coba.unt.edu/firel/data/IBES.htm" rel="nofollow">Thomson Financial I/B/E/S Data</a><br />
<strong>tags</strong>: finance,</li>
<li><a href="http://help.yahoo.com/l/us/yahoo/finance/quotes/quote-12.html" rel="nofollow">Historical Quotes &#8211; Yahoo! Finance</a><br />
<strong>tags</strong>: yahoo, finance, stock, price,</li>
<li><a href="http://www-personal.umich.edu/~mejn/netdata/" rel="nofollow">Network data</a><br />
<strong>tags</strong>: network, links</li>
<li><a href="http://www.bls.gov/" rel="nofollow">Bureau of Labor Statistics Home Page</a><br />
<strong>tags</strong>: statistics, labor, government, consumer</li>
<li><a href="http://www.realtor.org/Research.nsf/Pages/EHSdata" rel="nofollow">NAR: Research: EHS Data</a><br />
<strong>tags</strong>: housing, sales, finance</li>
<li><a href="http://www.ethanolrfa.org/industry/statistics/" rel="nofollow">RFA &#8211; The Industry &#8211; Industry Statistics</a><br />
<strong>tags</strong>: ethanol,</li>
<li><a href="http://www.csgis.com/csgis-frontend/common/jsp/RetailLocations.jsp" rel="nofollow">Chain Store Guide &#8211; Retail Locations</a><br />
<strong>tags</strong>: retail, finance, store, locations, gis</li>
<li><a href="http://www.directionsmag.com/press.releases/index.php?duty=Show&amp;id=8503&amp;trv=1" rel="nofollow">Press Releases &#8211; Directions Magazine</a><br />
<strong>tags</strong>: retail, gis, store, locations</li>
<li><a href="http://www.eia.doe.gov/" rel="nofollow">Energy Information Administration &#8211; EIA &#8211; Official Energy Statistics from the U.S. Government</a><br />
<strong>tags</strong>: finance, government, energy, historical, forecasts, fuel, oil</li>
<li><a href="http://peipa.essex.ac.uk/benchmark/databases/index.html#faces" rel="nofollow">Databases you can use for benchmarking</a><br />
<strong>tags</strong>: links</li>
<li><a href="http://www.upcdatabase.com/downloads/" rel="nofollow">UPC Database: Downloads</a><br />
<strong>tags</strong>: product, upc, database,</li>
<li><a href="http://people.oii.ox.ac.uk/escher/web-crawling-crawl-datasets/" rel="nofollow">Web Crawling / Crawl Datasets at Tobias Escher at the OII</a><br />
<strong>tags</strong>: crawler, benchmark, search, web, links</li>
<li><a href="http://techtc.cs.technion.ac.il/" rel="nofollow">TechTC &#8211; Technion Repository of Text Categorization Datasets</a><br />
<strong>tags</strong>: corpus, text</li>
<li><a href="http://www.d.umn.edu/~tkwon/TMCdata/TMCarchive.html" rel="nofollow">TMC data archive download site</a><br />
<strong>tags</strong>: traffic, data,</li>
<li><a href="http://www.volvis.org/" rel="nofollow">http://www.volvis.org/</a><br />
<strong>tags</strong>: volumerendering</li>
<li><a href="http://www.vision.caltech.edu/html-files/archive.html" rel="nofollow">Computational Vision: Archive</a><br />
<strong>tags</strong>: vision, caltech, imagerecognition</li>
<li><a href="http://www.gavrila.net/Computer_Vision/Research/Pedestrian_Detection/DC_Pedestrian_Class__Benchmark/dc_pedestrian_class__benchmark.html" rel="nofollow">DC Pedestrian Classification Benchmark</a><br />
<strong>tags</strong>: pedestrian, image, classification, detection</li>
<li><a href="http://www.opentick.com/" rel="nofollow">opentick :: home</a><br />
<strong>tags</strong>: finance, economics, feed, free, stock, trading, opentick, opensource</li>
<li><a href="http://webascorpus.org/" rel="nofollow">Web as Corpus</a><br />
<strong>tags</strong>: textmining, corpus, concordance, wordlist, n-gram</li>
<li><a href="http://packetstormsecurity.nl/Crackers/wordlists/" rel="nofollow">.:[ packet storm ]:. &#8211; http://packetstormsecurity.org/</a><br />
<strong>tags</strong>: dictionary, hack, security, wordlist, password</li>
<li><a href="http://www.isi.edu/~adibi/Enron/Enron.htm" rel="nofollow">Enron Dataset</a><br />
<strong>tags</strong>: data, mysql, email, energy, text, socialnetwork</li>
<li><a href="http://ebiquity.umbc.edu/resource/html/id/212/Splog-Blog-Dataset" rel="nofollow">Splog Blog Dataset</a><br />
<strong>tags</strong>: blog, corpus, spam</li>
<li><a href="http://people.csail.mit.edu/jrennie/20Newsgroups/" rel="nofollow">Home Page for 20 Newsgroups Data Set</a><br />
<strong>tags</strong>: corpus, text, newsgroup</li>
<li><a href="http://www.whiteglovetracking.com/about.html" rel="nofollow">White Glove Tracking</a><br />
<strong>tags</strong>: crowdsourcing, image, processing, algorithm, collaborative, distributed, web2.0, code, opensource</li>
<li><a href="http://lwf.ncdc.noaa.gov/paleo/coral/coral_data.html" rel="nofollow">NOAA Paleoclimatology Program &#8211; Coral and Sclerosponge Data</a><br />
<strong>tags</strong>: paleoclimatology, climate, oceanography, coral, sponge, biology</li>
<li><a href="http://www.census.gov/epcd/www/naics.html" rel="nofollow">NAICS &#8212; North American Industry Classification System</a><br />
<strong>tags</strong>: finance, economics, naics, industry, classifications</li>
<li><a href="http://www.wired.com/software/webservices/commentary/circuitcourt/2006/10/72001" rel="nofollow">Saving Democracy With Web 2.0 -</a><br />
<strong>tags</strong>: democracy, web2.0, mashup, government, funding, article</li>
<li><a href="http://www.sourcewatch.org/index.php?title=Congresspedia" rel="nofollow">Congresspedia &#8211; Congresspedia</a><br />
<strong>tags</strong>: collaborative, wiki, government, congress, politics, elections, web2.0, directory</li>
<li><a href="http://www.census.gov/popest/datasets.html" rel="nofollow">Population Estimates Data Sets</a><br />
<strong>tags</strong>: census, data, population, statistics</li>
<li><a href="http://cran.r-project.org/src/contrib/Views/MachineLearning.html" rel="nofollow">CRAN Task View: Machine Learning &amp; Statistical Learning</a><br />
<strong>tags</strong>: statisticallearning, machinelearning, code, R, libraries, cran,</li>
<li><a href="http://www.daniel-lemire.com/blog/data-for-data-mining/" rel="nofollow">Data for Data Mining</a><br />
<strong>tags</strong>: linkd, datamining, timeseries, text, extraction, socialnetwork</li>
<li><a href="http://paida.sourceforge.net/" rel="nofollow">PAIDA &#8211; Pure Python scientific analysis package</a><br />
<strong>tags</strong>: python, visualization, library</li>
<li><a href="http://ailab.uta.edu/subdue/" rel="nofollow">SUBDUE &#8211; Graph Based Knowledge Discovery</a><br />
<strong>tags</strong>: machinelearning, network, graph,</li>
<li><a href="http://www.gregsadetsky.com/aol-data/" rel="nofollow">AOL search data mirrors</a><br />
<strong>tags</strong>: aol, search,</li>
<li><a href="http://cheeseshop.python.org/pypi/shakespeare/0.4" rel="nofollow">Python Cheese Shop : shakespeare 0.4</a><br />
<strong>tags</strong>: python, text,</li>
<li><a href="http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html" rel="nofollow">AG&#8217;s corpus of news articles</a><br />
<strong>tags</strong>: corpus, nlp, machinelearning, textmining</li>
<li><a href="http://video.google.com/videoplay?docid=-5181553078911044640&amp;q=type%3Agoogle+engEDU" rel="nofollow">Sampling Techniques for Massive Data &#8211; Google Video</a><br />
<strong>tags</strong>: video, machinelearning, statistics, matrix, sampling, large, sparse, algorithm, experiment_design, towatch</li>
<li><a href="http://people.iarc.uaf.edu/~cswingle/blog/?p=38" rel="nofollow">metachronistic » Mirror the Wikipedia</a><br />
<strong>tags</strong>: wikipedia, laptop, install, dump</li>
<li><a href="http://research.microsoft.com/users/tyliu/LETOR/" rel="nofollow">LETOR: Benchmark Datasets for Learning to Rank</a><br />
<strong>tags</strong>: ranking, search</li>
<li><a href="http://cns.bu.edu/~gsc/CN710/pmwiki.php?n=Main.ClassProject#GreyhoundData" rel="nofollow">CN710: Comparative Analysis of Learning Systems (Spring 2006) &#8211; Class Project</a><br />
<strong>tags</strong>: machinelearning, algorithm, ogi, bu, greyhound, finance</li>
<li><a href="http://www.urbansim.org/" rel="nofollow">UrbanSim Home</a><br />
<strong>tags</strong>: python, urban, software, simulation, opensource, GIS, census,</li>
<li><a href="http://labs.systemone.at/wikipedia3" rel="nofollow">System One &#8211; Wikipedia³</a><br />
<strong>tags</strong>: wikipedia, rdf,</li>
<li><a href="http://www.systemone.at/en/ecosystem/labs" rel="nofollow">System One &#8211; Labs</a><br />
<strong>tags</strong>: wikipedia, rdf, tools</li>
<li><a href="http://www.face-rec.org/databases/" rel="nofollow">Face Recognition Homepage &#8211; Databases</a><br />
<strong>tags</strong>: face, algorithm, facerecognition, data, image</li>
<li><a href="http://cbcl.mit.edu/software-datasets/FaceData2.html" rel="nofollow">CBCL SOFTWARE Face data set</a><br />
<strong>tags</strong>: face, seung, algorithm, recognition, image</li>
<li><a href="http://clearforest.com/" rel="nofollow">Text Analytics Solutions from ClearForest</a><br />
<strong>tags</strong>: extraction, finance, semantic, semanticweb, text</li>
<li><a href="http://video.google.com/videoplay?docid=-7148095875242160718" rel="nofollow">23C3 &#8211; Mining Search Queries &#8211; Google Video</a><br />
<strong>tags</strong>: aol, search, video, talk, algorithm, informationretrieval, datamining, machinelearning</li>
<li><a href="http://digitalhistoryhacks.blogspot.com/2007/01/keywords-and-clues.html" rel="nofollow">Digital History Hacks: Keywords and Clues</a><br />
<strong>tags</strong>: aol, search, query, analysis</li>
<li><a href="http://digitalhistoryhacks.blogspot.com/2006/10/searching-for-history.html" rel="nofollow">Digital History Hacks: Searching for History</a><br />
<strong>tags</strong>: aol, search, query, analysis</li>
<li><a href="http://tkyte.blogspot.com/2006/08/interesting-data-set.html" rel="nofollow">The Tom Kyte Blog: An interesting data set&#8230;</a><br />
<strong>tags</strong>: aol, search, oracle, database, code</li>
<li><a href="http://www.acm.org/sigs/sigkdd/kdd2005/kddcup.html" rel="nofollow">KDD 2005 &#8211; KDD Cup 2005: Aug 21-24, Chicago, IL. USA</a><br />
<strong>tags</strong>: query, categorization, algorithm, google</li>
<li><a href="http://nlp.stanford.edu/links/statnlp.html" rel="nofollow">Statistical NLP / corpus-based computational linguistics resources</a><br />
<strong>tags</strong>: corpus, machinelearning, text</li>
<li><a href="http://www2.imm.dtu.dk/~rem/index.php?page=data" rel="nofollow">Ph.d.-student Rasmus Elsborg Madsen</a><br />
<strong>tags</strong>: text, machinelearning, context, matlab</li>
<li><a href="http://www.intelligent-web.org/wsm/tools/" rel="nofollow">Intelligent Web Search and Mining: Tools &amp; Resources</a><br />
<strong>tags</strong>: machinelearning, code, links</li>
<li><a href="http://www.cofc.edu/~langvillea/PRDataCode/index.html" rel="nofollow">PageRank Datasets and Code</a><br />
<strong>tags</strong>: pagerank, code, algorithm</li>
<li><a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html" rel="nofollow">Official Google Research Blog: All Our N-gram are Belong to You</a><br />
<strong>tags</strong>: linguistics, google, ngram, nlp, record_linkage</li>
<li><a href="http://www.javaworld.com/javaworld/jw-11-2006/jw-1121-thread.html" rel="nofollow">Hyper-threaded Java &#8211; Java World</a><br />
<strong>tags</strong>: clustering, algorithm, java, parallel</li>
<li><a href="http://www.stat.columbia.edu/~cook/movabletype/mlm/" rel="nofollow">Statistical Modeling, Causal Inference, and Social Science</a><br />
<strong>tags</strong>: blog, econometrics, finance, machinelearning, math, statistics</li>
<li><a href="http://elsa.berkeley.edu/~mcfadden/discrete.html" rel="nofollow">Structural Analysis of Discrete Data and Econometric Applications, by Charles F. Manski and Daniel L. McFadden, MIT Press, 1981.</a><br />
<strong>tags</strong>: books, econometrics, economics, finance, ebook</li>
<li><a href="http://krisbrower.com/2007/02/02/google-onpage-search-results-analysis/" rel="nofollow">Kris Brower » Archives » Google Onpage Search Results Analysis</a><br />
<strong>tags</strong>: google, ranking, aol, search, analytics</li>
<li><a href="http://www.cs.ucsd.edu/~elkan/250B/" rel="nofollow">CSE 250B Fall 2006</a><br />
<strong>tags</strong>: netflixprize, machinelearning, course,</li>
<li><a href="http://math.nist.gov/MatrixMarket/index.html" rel="nofollow">Matrix Market</a><br />
<strong>tags</strong>: matrixmarket, matrix,</li>
<li><a href="http://www.gps.caltech.edu/~tapio/imputation/" rel="nofollow">Analysis of incomplete datasets: Estimation of mean values and covariance matrices and imputation of missing values</a><br />
<strong>tags</strong>: imputation, matlab, missing, EM, machinelearning</li>
<li><a href="http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html" rel="nofollow">Face Detection</a><br />
<strong>tags</strong>: face, image</li>
<li><a href="http://www.cs.ucsd.edu/~elkan/250B/project4.html" rel="nofollow">CSE 250B Project 4, Fall 2006</a><br />
<strong>tags</strong>: subset, netflixprize, dimensionality, reduction</li>
<li><a href="http://www.frantz.fi/software/g3data.php" rel="nofollow">G3DATA</a><br />
<strong>tags</strong>: extract, from, graphs, hack, google, trends</li>
<li><a href="http://www.w3.org/2000/10/swap/doc/cwm" rel="nofollow">cwm &#8211; a general purpose data processor for the semantic web</a><br />
<strong>tags</strong>: python, processor, semantic, web, rdf</li>
<li><a href="http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/" rel="nofollow">WebBase Project</a><br />
<strong>tags</strong>: link, analysis, sturcture, web, crawler, stanford</li>
<li><a href="http://www.cs.toronto.edu/~roweis/data.html" rel="nofollow">sam roweis : data</a><br />
<strong>tags</strong>: machine, learning, matlab, python, hackers, image</li>
<li><a href="http://algoval.essex.ac.uk/data/sequence/mnist/" rel="nofollow">Index of /data/sequence/mnist</a><br />
<strong>tags</strong>: mnist, xml, format</li>
<li><a href="http://www.cs.cmu.edu/~15781/web/digits.html" rel="nofollow">MNIST handwritten digit database</a><br />
<strong>tags</strong>: mnist,</li>
<li><a href="http://www.informatik.uni-freiburg.de/~cziegler/BX/" rel="nofollow">Book-Crossing Dataset</a><br />
<strong>tags</strong>: data, set, collaborative, filtering, datamining, books, movie</li>
<li><a href="http://www.allmovie.com/cg/avg.dll?p=avg&amp;sql=21:avg/info_pages/a_about.html" rel="nofollow">allmovie</a><br />
<strong>tags</strong>: movie, netflixprize, source</li>
<li><a href="http://www.collectorz.com/moviedatabase/submit.php" rel="nofollow">Submissions Guidelines for the Collectorz.com Online Movie Database</a><br />
<strong>tags</strong>: movie, source</li>
<li><a href="http://www.cinema.com/film/183/matrix/index.phtml" rel="nofollow">cinema.com</a><br />
<strong>tags</strong>: plot, synopsis, movie, netflixprize, prize</li>
<li><a href="http://lumiere.obs.coe.int/web/search/index.php" rel="nofollow">LUMIERE</a><br />
<strong>tags</strong>: netflixprize, prize, european, movie, revenue,</li>
<li><a href="http://meta.wikimedia.org/wiki/Importing_a_Wikipedia_database_dump_into_MediaWiki" rel="nofollow">Data dumps &#8211; Meta</a><br />
<strong>tags</strong>: mediawiki, wikipedia, import, mysql, sql</li>
<li><a href="http://www.google.com/search?hl=en&amp;q=%22phone+***%22+%22+address+*%22+%22e-mail%22+intitle%3A%22curriculum+vitae%22&amp;btnG=Google+Search" rel="nofollow">&#8220;phone ***&#8221; &#8221; address *&#8221; &#8220;e-mail&#8221; intitle:&#8221;curriculum vitae&#8221; &#8211; Google Search</a><br />
<strong>tags</strong>: resume, google</li>
</ul>
</div>
<p>The post <a href="http://piktochart.com/2011/08/6-useful-databases-to-dig-for-data/">6 Useful Databases to Dig for Data (and 100 more)</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/08/6-useful-databases-to-dig-for-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scraping Data from a Table in a HTML Page via Google Docs</title>
		<link>http://piktochart.com/2011/07/scraping-data-from-a-table-in-a-html-page-via-google-docs/</link>
		<comments>http://piktochart.com/2011/07/scraping-data-from-a-table-in-a-html-page-via-google-docs/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 07:04:43 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=114</guid>
		<description><![CDATA[<p>This is a pretty neat tutorial we found at EagerEyes&#8217; blog. It allows you to scrap data from a table in a HTML page and get all of that data via Google Docs, which is a spreadsheet that more and more are becoming accustomed to. 1.  Create a new spreadsheet on GDocs and enter the [...]</p><p>The post <a href="http://piktochart.com/2011/07/scraping-data-from-a-table-in-a-html-page-via-google-docs/">Scraping Data from a Table in a HTML Page via Google Docs</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<div id="node-744">
<p>This is a pretty neat tutorial we found at <a title="Eager Eyes" href="http://eagereyes.org/data/scrape-tables-using-google-docs" target="_blank">EagerEyes&#8217; blog</a>. It allows you to scrap data from a table in a HTML page and get all of that data via Google Docs, which is a spreadsheet that more and more are becoming accustomed to.</p>
<p>1.  Create a new spreadsheet on GDocs and enter the following expression in the top left cell: =ImportHtml(<em>URL</em>, &#8220;table&#8221;, <em>num</em>), e.g. =ImportHtml(www.piktochart.com/dataset,&#8221;list=Coffee-brewing,0)</p>
<ul>
<li><em>URL</em> here is the URL of the page (between quotation marks)</li>
<li>&#8220;table&#8221; is the element to look for (Google Docs can also import lists),</li>
<li><em>num</em> is the number of the element, in case there are more on the same page (which is rather common for tables).</li>
<li>The latter supposedly starts at 1, but I had to use 0 to get it to pick up the correct table.</li>
</ul>
<p>2. Once this is done, Google Docs retrieves the data and inserts it into the spreadsheet, including the headers.</p>
<p>3. The last step is to download the spreadsheet as a CSV file.</p>
<p>This is a very interesting and short way of making data input possible from a website to GDocs for work!</p>
</div>
<p>The post <a href="http://piktochart.com/2011/07/scraping-data-from-a-table-in-a-html-page-via-google-docs/">Scraping Data from a Table in a HTML Page via Google Docs</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/07/scraping-data-from-a-table-in-a-html-page-via-google-docs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>5 Excellent Resources to View/Create Visualisations</title>
		<link>http://piktochart.com/2011/07/5-excellent-resources-to-viewcreate-visualisations/</link>
		<comments>http://piktochart.com/2011/07/5-excellent-resources-to-viewcreate-visualisations/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 07:04:27 +0000</pubDate>
		<dc:creator>piktochart</dc:creator>
				<category><![CDATA[Data Collection & Research]]></category>

		<guid isPermaLink="false">http://piktochart.com/?p=126</guid>
		<description><![CDATA[<p>There are several places we go to for inspiration in terms of getting data visualizations done right. Among them are the 5 resources below. It is not just important to select the right types of visualizations for the data, but also get the colours, spacing, font, hue/opacity/saturation right! P/S: There are full degrees in graphic [...]</p><p>The post <a href="http://piktochart.com/2011/07/5-excellent-resources-to-viewcreate-visualisations/">5 Excellent Resources to View/Create Visualisations</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>There are several places we go to for inspiration in terms of getting data visualizations done right. Among them are the 5 resources below. It is not just important to select the right types of visualizations for the data, but also get the colours, spacing, font, hue/opacity/saturation right!</p>
<p>P/S: There are full degrees in graphic design for data visualizations now.</p>
<p>1. <a title="ChartsBin" href="http://chartsbin.com/" target="_blank">ChartsBin</a></p>
<p>Allows you to look at interactive graphs/charts. Data ranges from population statistics to &#8220;do you think people take advantage of you&#8221; surveys. Data sets are likely to be huge and  all data are referenced(Harvard style referencing). Allows you to embed the data and create interactive maps online instantly! (without installation and coding)</p>
<p>Our verdict: Not easy to get the values in pre-defined dataset, but it should work better since this is still the beta stage.</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/07/Screen-shot-2011-07-18-at-3.23.52-PM22.png"><img class="alignnone size-large wp-image-127" title="Geographic map with ChartsBin" src="http://piktochart.wpengine.com/wp-content/uploads/2011/07/Screen-shot-2011-07-18-at-3.23.52-PM2-1024x402.png" alt="Geographic map with ChartsBin" width="720" height="282" /></a></p>
<p>2.<a title="Github d3.js" href="http://mbostock.github.com/d3/ex/" target="_blank"> d3.js by Github</a></p>
<p>Allows you to create fancy data visualisations such as chord diagram, treemap, sunburst, streamgraph, bubble charts. Worth a look!</p>
<p>&nbsp;</p>
<p>3. <a title="JunkCharts" href="http://junkcharts.typepad.com/" target="_blank">JunkCharts</a></p>
<p>Recycling charts for art. Very interesting posts to expand your mind about what data visualisations can do.</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/07/6a00d8341e992c53ef014e89763c43970d22.jpg"><img class="alignnone size-large wp-image-128" title="Data visualisation with Junkcharts" src="http://piktochart.wpengine.com/wp-content/uploads/2011/07/6a00d8341e992c53ef014e89763c43970d2-1024x908.jpg" alt="Data visualisation with Junkcharts" width="720" height="638" /></a></p>
<p>&nbsp;</p>
<p>4. <a title="Dynamic Diagrams" href="http://dd.dynamicdiagrams.com/" target="_blank">Dynamic Diagrams</a></p>
<p>You can just view data visualisations here, but they are in-te-res-ting.</p>
<p>5. <a title="Bloomberg insights" href="http://www.bloomberg.com/insights/" target="_blank">Bloomberg</a></p>
<p>Who would have thought Bloomberg has a selection of the most colourful data visualisations?</p>
<p><a href="http://piktochart.wpengine.com/wp-content/uploads/2011/07/Screen-shot-2011-07-18-at-3.52.45-PM22.png"><img class="alignnone size-full wp-image-129" title="Bloomberg Insights Data Visualisation" src="http://piktochart.wpengine.com/wp-content/uploads/2011/07/Screen-shot-2011-07-18-at-3.52.45-PM22.png" alt="Bloomberg Insights Data Visualisation" width="892" height="442" /></a></p>
<p>The post <a href="http://piktochart.com/2011/07/5-excellent-resources-to-viewcreate-visualisations/">5 Excellent Resources to View/Create Visualisations</a> appeared first on <a href="http://piktochart.com">Piktochart Infographics</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://piktochart.com/2011/07/5-excellent-resources-to-viewcreate-visualisations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
