Scraping Data from a Table in a HTML Page via Google Docs

Posted by on .

This is a pretty neat tutorial we found at EagerEyes’ blog. It allows you to scrap data from a table in a HTML page and get all of that data via Google Docs, which is a spreadsheet that more and more are becoming accustomed to.

  1. Create a new spreadsheet on GDocs and enter the following expression in the top left cell: =ImportHtml(URL, “table”, num), e.g. =ImportHtml(,”list=Coffee-brewing,0)
    • URL here is the URL of the page (between quotation marks)
    • “table” is the element to look for (Google Docs can also import lists),
    • num is the number of the element, in case there are more on the same page (which is rather common for tables).
    • The latter supposedly starts at 1, but I had to use 0 to get it to pick up the correct table.
  2. Once this is done, Google Docs retrieves the data and inserts it into the spreadsheet, including the headers.
  3. The last step is to download the spreadsheet as a CSV file.

This is a very interesting and short way of making data input possible from a website to GDocs for work!