Chimniii

Mitul Tripathi

posted on 27-dec-17

Scrapping with python

Today we have lot of data which is available on the internet. what if these data we will have and use it for decision making and growth of our business. generally scrapping used for data analytics.Today I am going to explain two method of web scrapping
1. Selenium web driver
2. Beautiful soup.

Selenium web driver

we need to following library to install in python before go further

1. selenium web driver(it support chromedriver to execute)
2. Panda (It act as data frame which store data temporarily from the web page)
3. Chrome web driver

All the library can get install in using npm method using syntax
npm install selenium
npm install panda
npm install chromedriver

let's move to python code

from selenium import webdriver
import pandas as pd

record=pd.DataFrame([],columns=["name"])
driver = webdriver.Chrome('G:\spyder\Web Scraping\chromedriver.exe')
driver.get('https://www.musicstore.de/en_OE/EUR')

name=driver.find_elements_by_class_name("stoererwrapper")
for nam in name:
p_name=nam.find_element_by_tag_name('span').text
temp_record=pd.DataFrame([[p_name]],columns=["name"])
record=record.append(temp_record)
print record

sdf

First two line we are importing the library

In second line we are creating table with one column as name.

In third line we are accessing chromedriver, you need to give the path of chromedriver.

In fourth line we command chromedriver to load website.

In fifth line we are fetching the content having class name storewrapper which includes name of the product.

Further records getting insert into temprory table as temp_record, later it is restored in

orginal table as record

Below is the result of said code

print record
name
0 TOPSELLER!
0 XDJ-1000 MK2
0 The perfect player for digital DJs
0 NEW!
0 LITTLE MARCUS 500
0 Marcus Miller Signature - 500W and two Equalizers
0 MEGADEAL!
0 OPERA 12
0 Active 2-way speaker with DSP presets
0 MEGADEAL!
0 FOG FURY JETT
0 Vertical output, 12 x 3W LEDs, DMX

Beautiful Soup method

In order to Scrape by this method we need following library

BeautifulSoup (Library to pulling data from html and xml)
Request (This library support access to given website)
IO (This library helps to create file)
Pandas (This library helps to create temporary table within the system)

In this example we are going to scrap siteinspire.com(https://www.siteinspire.com/websites?categories=40)

Let's have fun

from bs4 import BeautifulSoup
import requests
import io
import pandas as pd

record=pd.DataFrame([], columns=['title'])

resp=request.get(https://www.siteinspire.com/websites?categories=40)

soup=BeautifulSoup(resp.text,'lxml')

try:

name = soup.find('div', {'class' : 'title'}).contents[1].strip()

records=pd.DataFrame([name],columns=['title'])

record=record.append(records)

record.csv('output.csv')

except Exception as e:

print e

That's it