Web Scraping with Puppeteer and Nodejs

Puppeteer is a nodejs library used to scrape data from website, automate browser's task and also for testing your app . Super easy to use.

Installing Pupepeteer ->

  • Install Nodejs if not present. -Run the following command in terminal
npm i puppeteer 
 //or
yarn add puppeteer

Getting started-> -In this tutorial we will code an app for taking screenshot of the page!

-Create a basic template for nodejs.

npm init -y
  • Create index.js in the root directory.
  • Import puppeteer using const puppeteer = require('puppeteer-core'); in index.js file

Code for Index js

const puppeteer = require('puppeteer');

//async function as we are using await

(async () => {

 // launch your browser in background
  const browser = await puppeteer.launch();

 // load a new page
  const page = await browser.newPage();

// go to url -> https://example.com
  await page.goto('https://example.com');

//take screenshot of the page 
//image will be saved in root directory
// with filename example.png
//will take screenshot of the page that is visible in browser's window
  await page.screenshot({path: 'example.png'});

// use this instead of above statement for full page screenshot
await page.screenshot({path: 'example.png',fullpage:'true'});

//for closing the browser
// tip-> always close browser after finishing all tasks
  await browser.close();
})();
  • For avoiding any error always use try and catch inside the async function
try{
//code 
}
catch(error)
{
console.log(error);
}
  • And finally for running your code use node index in your terminal

Did you find this article valuable?

Support Jatin Rawat by becoming a sponsor. Any amount is appreciated!