Basic scraping

Try =SITEPARSE("", "title") to check if a URL is correct and the page can be retrieved. Then substitute "title" with a CSS selector or an XPath that gets the data you need. If the second argument starts with /, it is interpreted as an XPath, otherwise it's assumed to be a CSS selector.

To get the XPath/CSS selectors, right click on it and choose Inspect element. This opens the Elements tab of the developer tools in Chrome or Firefox. Then right-click on the highlighted element and pick Copy full XPath , Copy XPath or Copy selector. You can highlight all elements found by a given selector by pressing Ctrl-F in the Elements tab and pasting the selector/XPath.

Only XPaths can access an html element's attribute, like the url of a link (a link is coded as <a href="">link text<a/> , the attribute is called href in this example but urls could be found also in src attribute of an <image> tag and elsewhere). An XPath to access the attribute above will end with /a/@href.

Scraping multiple pieces of data of the same type in a page

To scrape multiple pieces of data with a single selector (e.g. a column in a table) find the XPaths of two of the elements and replace the differences with *, e.g from:




you can make:


Scraping multiple selectors at once

You can also scrape multiple types of data from the same page with different selectors at once; e.g. product name and title. Just pass a range of cells containing selectors as the url parameter of SITEPARSE() (see this example to retrieve a book title, Kindle price and rating for a book on Amazon at once).

Advanced XPath expressions

Some XPath expressions fit for particular use cases are:

Use Case

Select a span element with given CSS class (myclass) with XPath

Select a link ( <a> element) starting with mytext

Select a link ( <a> element) pointing to

Select a <p> element (paragraph) containing mytext


//span[contains(concat(' ', @class, ' '), ' myclass')]




Scrape data behind login

To scrape data behind login forms, you need to:

  1. get the session cookies by submitting a login form, and

  2. include them in subsequent scraping requests, so that the site will serve the data it would serve a logged-in user.

You can watch a tutorial video, or look at a demo sheet.

You can describe the sequence of actions in a 2-column range of the spreadsheet and pass it in place of the selectors. The actions can be:

  • If the first column contains a selector /xpath, it must be a selector / XPath pointing to an <input> element and the second column contain the value to be filled it

  • If the second column contains the value #CLICK, the element selected with the selector will be clicked

  • You can put #WAIT in the first column and the number of seconds to wait in the 2nd column

  • you can place selectors / XPaths in the first column with an empty second column, then this data will be scraped and returned

Whenever a SITEPARSE() call is passed a list of actions to perform, it would return all session cookies so that they can be passed as a 3rd parameter to subsequent calls to SITEPARSE(). Typically the actions contain information to submit to a form, and the cookies are used in subsequent calls to scrape the data.

All actions must complete within 30 seconds, or the call will time out.