使用GoogleChrome/puppeteer在EC2上運作網頁截圖

Headless Chrome Node API

puppeteer可以當做一個UI測試的小工具,在Chrome中運行抓取網站內容,或是生成頁面的截圖及PDF,且對於SPA網頁可以抓取到pre-rendered content。

建議使用Node v7.6.0以上版本

安裝puppeteer時會自動幫你載好Chromium,在Windows環境可以直接開始使用

npm install puppeteer



index.js

const puppeteer = require('puppeteer')
const configMap = {
viewport: {
width:1920, //page width in pixels. required
height:1080,//page height in pixels. required
deviceScaleFactor: 1, // Specify device scale factor (can be thought of as dpr). Defaults to 1.
isMobile: false, // Whether the meta viewport tag is taken into account. Defaults to false.
hasTouch: false, // Specifies if viewport supports touch events. Defaults to false
isLandscape: false // Specifies if viewport is in landscape mode. Defaults to false.
}
};
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage()
await page.goto('https://lanlanlue.blogspot.com/')
await page.setViewport(configMap.viewport)
await page.screenshot({ path: 'example.png' })
await browser.close()
})();

viewport 可以設定所需截圖的size跟適用裝置


執行 node index.js 資料夾目錄底下就會多一個example.png的圖片檔 ↓長這樣

EC2

本機測試好後,接下來把東西丟到EC2跑看看,沒辦法正常運行。
Linux環境就會比較麻煩一點,需要安裝其他東西 可以參考這篇

先檢查缺了哪些東西
進到puppeteer目錄底下

$ cd /node_modules/puppeteer/.local-chromium/linux-609904/chrome-linux/
注意,linux-609904會因為版本不同而有差異

在這目錄下執行
ldd chrome | grep not

會顯示很多 **** => not found 缺少的東西

先安裝以下
$ sudo yum install cups-libs dbus-glib libXrandr libXcursor libXinerama cairo cairo-gobject pango

再查看一次
ldd chrome | grep not

應該還是會有不少個缺少的

原作那篇指出因為Amazon Linux少一些gtk內建的東西,需要從別的地方借用

以下都裝一裝

# Install ATK from CentOS 7
$ sudo rpm -ivh --nodeps http://mirror.centos.org/centos/7/os/x86_64/Packages/atk-2.22.0-3.el7.x86_64.rpm
$ sudo rpm -ivh --nodeps http://mirror.centos.org/centos/7/os/x86_64/Packages/at-spi2-atk-2.22.0-2.el7.x86_64.rpm
$ sudo rpm -ivh --nodeps http://mirror.centos.org/centos/7/os/x86_64/Packages/at-spi2-core-2.22.0-1.el7.x86_64.rpm
# Install GTK from fedora 20
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/g/GConf2-3.2.6-7.fc20.x86_64.rpm
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libXScrnSaver-1.2.2-6.fc20.x86_64.rpm
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libxkbcommon-0.3.1-1.fc20.x86_64.rpm
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libwayland-client-1.2.0-3.fc20.x86_64.rpm
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libwayland-cursor-1.2.0-3.fc20.x86_64.rpm
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/g/gtk3-3.10.4-1.fc20.x86_64.rpm
# Install Gdk-Pixbuf from fedora 16
$ sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/16/Fedora/x86_64/os/Packages/gdk-pixbuf2-2.24.0-1.fc16.x86_64.rpm
# 順利的話這邊就結束了

我是還有缺少libasound.so2
就裝這個 
yum install alsa-lib-devel

還是不行的話,找看看
yum provides */lbasound.so2

再不行就
yum install kernel-headers kernel-devel ffmpeg-devel

其餘有碰到的話可以Google丟一下關鍵字找看看其他人怎麼裝的

EC2運行抓到的圖片長這樣... 中文字會被吃掉應該是編碼還需要調整

留言

這個網誌中的熱門文章

儲存碟不要用主機板做RAID,Windows內建即可

透過CMD格式化磁碟或USB

PCMAN自動登入