HTML is treated as a tree of nodes. The topmost element of the tree is called the root element. XPath is used to navigate an HTML document and to query for specific elements (elements, nodes, widgets, and tags are used interchangeability in this document). There are 7 node types in XPath, but only 3 of them are of interest to this document: element, attribute, and text.

Here is an HTML example that we will use throughout this cheat sheet:

<html>
      <head>
            <title>website title</title>
      </head>
      <body>
            <table color="white">
                  <tr number="first">
                        <th>First Name</th>
                        <th>Last Name</th>
                        <th>Age</th>
                  </tr>
                  <tr number="second">
                        <td>Jill</td>
                        <td>Smith</td>
                        <td>50</td>
                  </tr>
                  <tr number="third">
                        <td>Dave</td>
                        <td>Fisher</td>
                        <td>32</td>
                  </tr>
            </table>
      </body>
</html>

Examples of node types in the above HTML document are:

<html>: Root element node
<body>, <tr>, <table>: element nodes
color="white": attribute
First Name, Jill, 32: text

Relationship between nodes:

Parent: Each element or node has one parent with the exception of the root node. In the HTML above, html is the parent of head and body.

Children: Each element may have zero, one, or more children. In the HTML above, each td has zero child, head has one child called title, each tr has 3 td children.

Siblings: Children with the same parent are called siblings. In the above example, head and body are siblings. Also, all tr nodes are siblings as well.

Ancestors: A node’s parent, parent of parent, etc. are called ancestors. In the above example:

<td>Dave</td> has the ancestors of <tr number="third">, <table color="white">, <body>, <html>.

Descendants: A node’s children, children’s children, etc. are called descendants. In the above example:

<td>Dave</td>, <tr number="third">, <table color="white">, <body> are all descendants of <html>.

Most frequently used XPath expressions and their meanings are included in the following table:

XPath	Description
/	· If XPath starts with it: selects from the root node. Used in absolute XPath’s · If used in the middle of XPath: selects the child node from the current node
//	· If XPath starts with it: selects anywhere in the HTML. Used in relative XPath’s · If used in the middle of XPath: selects the descendants of the current node
.	Selects current node
..	Selects the parent of current node
@	Selects attributes

Here are some examples of the above expressions:

XPath	Description
/html	Selects the root element html
//tr	Selects all the tr elements
//tr[@number=’first’]	Selects the first tr in the above HTML because it has the attribute number with the value first.
//tr[@number=’first’]/..	Selects the table node. Basically, it first finds the first tr element and then go to its immediate parent
//table//td	Selects all the td elements. It first finds the table node and then finds all of its descendant td nodes

If there are multiple of a node, you can use numbers in brackets to refer to a specific one. You can find some examples below:

XPath	Description
(//table/tr)[1]	Selects the first row (i.e. tr) of the table
(//table/tr)[last()]	Selects the last row of the table

The following wildcards are available in XPath:

Wildcard	Description
*	Matches any node
@*	Matches any attribute node
Node()	Matches any node of any kind

Here are some examples of the above wildcards:

Wildcard	Description
//table/*	Selects the children of the table element
//tr[@*]	Selects all the tr nodes with at least one attribute

Finding a Node Relative to Another Node

There are times when a node does not have an id or name or if it does, it is dynamically generated. Elements in a table is a good example of this scenario. In these instances, the best way to find the element is to come up with an XPath based on some other element on the page that is constant or has id or name. To do these relative XPath’s, you will need to know about the following terms:

Axis Name	Description
ancestor	Selects all the ancestors of the current node
ancestor-or-self	Select all the ancestors of the current node plus the current node itself
attribute	Selects all the attributes of the current node
child	Selects all children of the current node
descendant	Selects all descendants of the current node
following-sibling	Selects all siblings after the current node
Parent	Selects the parent of the current node
preceding-sibling	Selects all siblings before the current node

Some examples of Axis are outlined below:

XPath	Description
//td[contains(text(), ‘Smith’]/preceding-sibling::td[1]	Selects the first td that includes the text Jill based on the sibling td that has the text Smith
//td[contains(text(), ‘Smith’]/ancestor::table[1]	Selects the first table node based on the td that has the text Smith

Some XPath best practices:

Start all of your XPath’s with //. This says to find an element anywhere in the HTML page regardless of where it is located and it is not dependent on other elements on the page
Use ‘contains()’ instead of ‘=’ when searching for a text inside an element. The reason is that ‘=’ will fail if there are white spaces around the text you are looking for. Examples are:
- To find a td with text including Jill for example <td>Jill</td>, use this XPath: //td[contains(text(), ‘Jill’)]
- To find a link with text including MyText for example <a>MyText<a>, use this XPath: //a[contains(text(), ‘MyText’)]
Do not use ‘following’ or ‘preceding’ by themselves as they do not work well with IE. following-sibling or preceding-sibling are fine to use.

More XPath examples:

XPath	Description
//input[@value='Continue' and @name='btnContinue']	Select element of type input with attribute value matching 'Continue' and attribute name matching btnContinue
//*[@id='someId']	Select any html element with id equals to someId. Note that the id is case sensitive and is a whole match.
//a[@href='someUrl']	Select a link (Html anchor element) with attribute href matching 'someUrl'. Case sensitive
//a[text()='Exact Match Case sensitive']	Select a link (Html anchor element) with attribute visible text matching exactly 'Exact Match Case sensitive'.
//a[contains(text(),'Partial Match Visible Text')]	Select a link (Html anchor element) with visible text containing text 'Partial Match Visible Text'.
//a[@title='Some Title text']	Select a link (Html anchor element) with attribute title matching 'Some Title text'. Case sensitive
//*[contains(text(),'$SOME_DYNAMIC_PARAM')] /following::a[contains(text(),'Relative Link to previous link')]	Relative match. This is for dynamic elements where you locate the dynamic element and relative to that element you select the element you want to interact with.
(//a[@href='Login.asp'])[1]	Find the first link that has the attribute href='Login.asp'. For second match use [2], third [3] and for [last() -1] for the one before the last.
//table[@id='myTable']/tr[last()]	Select the last row of the table with id='myTable'
//a[contains(@title,'bob')]	Find a link that contains 'bob' in its title attribute.
//a[starts-with(@title,'bob')]	Find the link that has the title that starts with 'bob'.

Browser not supported