Getting started with Playwright in Python

2023-10-07

#playwright #python

Playwright is an open source project, created by Microsoft, that is designed and marketed as a modern solution to browser automation and end-to-end testing. It is a major competitor to the popular project, Selenium. Paired with the Python programming language, it provides us with a very capable set of tools to automate interactions with websites and APIs. In this post, I will step through how we can setup our development environment and write a simple Python script that will automate an interaction on a webpage.

Setting up the development environment

To get started, we will need Python installed. Head to the official Python website here to follow their installation process for your chosen operating system.

With Python installed, next we will need to create a new directory for our script.

mkdir playwright-demo
cd playwright-demo

Within our new working directory, we can move onto setting up a virtual environment. This lets us install our project dependencies in a local environment. My personal choice is to use venv.

python3 -m venv venv

This command creates a virtual environment in our current directory named venv. We could name it anything, but I always choose venv for the sake of simplicity. However, just creating the virtual environment is only half of the job. We must activate it to begin using it. This command varies depending on which operating system and shell you are using. I recommend referencing the venv documentation here.

source venv/bin/activate

Finally we will need a Python file to write our code in. It is common practice to have a main.py file as the entry point of our program.

touch main.py

With each of these steps complete, we can now install the Playwright library and its dependencies into our virtual environment.

Installing Playwright

To install the Playwright library into our virtual environment, we will use pip the Python package manager. Ensure your virtual environment is activated before installing any dependencies.

pip install playwright

This has installed the library, but we will also need to have the browsers installed. Luckily, Playwright comes with its own command to install the relevant browser binaries for Chromium, Firefox and Webkit.

playwright install

Now we are ready to begin writing the Python script.

Writing our first script

To start, open main.py in your editor of choice.

To be able to use the Playwright library, we must first import it. We can declare this at the top of the file.

from playwright.sync_api import sync_playwright

The sync_playwright function we have imported returns a context manager which we can use to interact with the Playwright API. So our first step will be to create the context manager and create a browser object for the browser of our choice.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()

With our Browser context created, we next need to create a page object using the new_page() method on the browser object. Visualise this as the tab inside your web browser. This page object will expose methods that let us access a webpage and interact with it.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

We can now use the goto() method to visit a website. We just need to pass the URL to the method as a string. For this example, I will use Google. To ensure our browser shuts down gracefully at the end of our script, we can use the close() method from the browser object.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
   browser = p.chromium.launch()
   page = browser.new_page()
   page.goto("https://www.google.com")
   browser.close()

In our terminal we can run our Python file by simply calling the python executable and the name of the file.

python main.py

This will run our script. By default it will run headless so that we do not see anything rendered on the screen. We can modify this by passing values for the headless and slow_mo arguments in the launch() method. We do not need a slow_mo value, but it helps slow down the execution so that we are able to see what is happening.

browser = p.chromium.launch(headless=False, slow_mo=500)

If you rerun the program, you should now see a Chromium instance open, the Google website loaded and then the browser closed.

From here we can expand the functionality of our script as far as we want, but I will tackle that in depth in future posts. For now, we will just get the page title and print it to the terminal. We can use the title() method from our page object to accomplish this. The final version of our script should look like this.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False, slow_mo=500)
    page = browser.new_page()
    page.goto("https://www.google.com")
    print(page.title())
    browser.close()

If we run this again, you should see Google printed to the terminal.

python main.py
Google

Now experiment with this as much as possible. Try this out with a different URL and see what the output is, or use the page.screenshot() method and see what it captures. Do not forget to look through the Playwright documentation for further help.