目录:
Playwright 是微软开发的浏览器自动化库,支持所有浏览器引擎,在自动化测试,网页爬虫用的比较多。
与之类似的库是 Selenium,Selenium 与 WebDriver 协议通信,而 Playwright 直接与浏览器内核交互,执行速度通常能提升 30% 至 50%。
Playwright 还具备丰富的功能,除了获取页面元素,点击元素外,还具备网络请求拦截,页面截图和视频录制,文件上传下载等功能。
安装依赖:
$ pip install playwright
安装完成后,直接写代码会报如下错误:
playwright._impl._errors.Error: BrowserType.launch: Executable doesn't exist at /Users/dkvirus/Library/Caches/ms-playwright/chromium_headless_shell-1208/chrome-headless-shell-mac-arm64/chrome-headless-shell
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ playwright install ║
║ ║
║ <3 Playwright Team ║
╚════════════════════════════════════════════════════════════╝
按照它的提示执行 $ python -m playwright install chromium 下载浏览器驱动。若只需安装特定浏览器(如 Chromium),可使用 playwright install chromium。
运行如下脚本验证安装是否成功:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://www.baidu.com")
print(page.title())
browser.close()
基本使用:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
# 启动浏览器: 没有参数默认是无头浏览器
# browser = p.chromium.launch()
# 想要看到浏览器就设置参数 headless=False
browser = p.chromium.launch(headless=False)
# 打开一个标签页
page = browser.new_page()
# 访问百度
page.goto("https://www.baidu.com")
# 打印网页标题
print("页面标题:", page.title())
# 关闭资源
browser.close()
比如要进行多用户测试场景,希望标签页与标签页之间缓存不要互相影响,此时就会用到 BrowserContext,由 context 去创建标签页。
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
# 启动浏览器: 没有参数默认是无头浏览器
# browser = p.chromium.launch()
# 想要看到浏览器就设置参数 headless=False
browser = p.chromium.launch(headless=False)
# 创建隔离的上下文
context = browser.new_context()
# 打开一个标签页
page = context.new_page()
# 访问百度
page.goto("https://www.baidu.com")
# 打印网页标题
print("页面标题:", page.title())
# 关闭资源
browser.close()
使用 Locator API,它内置了自动等待功能。
# 通过文本内容定位
page.locator("text=登录").click()
# 通过 CSS 选择器定位
page.locator(".submit-btn").click()
# 通过 XPath 定位
page.locator('//button[@id="submit"]').click()
# 获取元素属性
href = page.locator("a.link").get_attribute("href")
# 获取元素文本
text_content = page.locator("div.result").text_content()
模拟 API 响应。
# 拦截并修改特定请求的响应
def handle_route(route):
if "api/data" in route.request.url:
# 返回自定义的模拟数据
route.fulfill(status=200, json={"custom": "data"})
else:
# 继续正常请求
route.continue_()
page.route("**/*", handle_route)
page.goto("https://example.com")
这个太牛逼了,对于不会写代码的用户来说,在终端敲一个命令,Playwright 会自动打开目标网站,然后用户可以在网站上做正常的测试流程,当把浏览器关闭后,会在本地生成对应的测试代码。
$ python -m playwright codegen --target python -o test.py -b chromium https://www.baidu.com
--target python 用 python 语言生成测试代码;-o test.py 生成测试代码写在 test.py 里;-b chromium https://www.baidu.com 使用谷歌浏览器打开目标网站。我测试了一下,先在百度搜索框内输入 dkvirus blog,没找到我的博客,然后在浏览器地址栏手动输入了我的博客地址,并打开了第一篇文章。当我关闭浏览器时,本地生成了 test.py 文件,内容如下:
import re
from playwright.sync_api import Playwright, sync_playwright
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto("https://www.baidu.com/")
page.get_by_role("textbox", name="伊朗宣布停火条件").click()
page.get_by_role("textbox", name="伊朗宣布停火条件").fill("dkvirus blog")
page.goto("https://blog.dkvirus.com/")
page.get_by_role("link", name="【CSS】打印相关属性介绍").click()
# ---------------------
context.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)
from playwright.sync_api import sync_playwright
def capture_full_page_screenshot(url, output_file):
with sync_playwright() as p:
# 启动浏览器(headless=True 表示无头模式,不显示界面)
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# 导航到目标网址
page.goto(url)
# 等待页面加载完成(推荐添加等待确保稳定性)
page.wait_for_load_state('networkidle') # 等待网络空闲
# 截取整个页面
page.screenshot(path=output_file, full_page=True)
print(f"全页面截图已保存至: {output_file}")
browser.close()
# 使用示例
capture_full_page_screenshot("https://example.com", "fullpage_screenshot.png")
from playwright.sync_api import sync_playwright
def capture_element_screenshot(url, selector, output_file):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
# 等待元素加载(避免因元素未加载而截图失败)
page.wait_for_selector(selector, state='visible', timeout=5000)
# 等待页面加载完成(推荐添加等待确保稳定性)
page.wait_for_load_state('networkidle') # 等待网络空闲
# 定位元素并截图
element = page.locator(selector)
element.screenshot(path=output_file)
print(f"元素截图已保存至: {output_file}")
browser.close()
# 使用示例:截取百度首页的搜索表单
capture_element_screenshot(
"https://www.baidu.com",
".form", # CSS 选择器
"baidu_form.png"
)
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.baidu.com")
# 等待页面加载完成(推荐添加等待确保稳定性)
page.wait_for_load_state('networkidle') # 等待网络空闲
# 使用 XPath 或 CSS 选择器定位元素
element_handle = page.query_selector("//div[@id='lg']") # XPath 示例
if element_handle:
element_handle.screenshot(path="logo_screenshot.png")
else:
print("未找到指定元素")
browser.close()
使用 file:// 协议可以轻松访问本地文件。
from playwright.sync_api import sync_playwright
import os
def screenshot_local_html(file_path, output_path):
"""
对本地 HTML 文件进行截图
参数:
file_path: 本地 HTML 文件的路径
output_path: 截图保存路径
"""
# 将文件路径转换为 file:// URL
absolute_path = os.path.abspath(file_path)
file_url = f"file://{absolute_path}"
with sync_playwright() as p:
# 启动浏览器(headless=True 表示无界面模式)
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# 访问本地文件
page.goto(file_url)
# 等待页面加载完成
page.wait_for_load_state('networkidle')
# 截取全页面
page.screenshot(path=output_path, full_page=True)
print(f"本地文件截图已保存至: {output_path}")
browser.close()
# 使用示例
screenshot_local_html("example.html", "local_screenshot.png")
↶ 返回首页 ↶