Scrapping with python
Today we have lot of data which is available on the internet. what if these data we will have and use it for decision making and growth of our business. generally scrapping used for data analytics.Today I am going to explain two method of web scrapping
1. Selenium web driver
2. Beautiful soup.
Selenium web driver
we need to following library to install in python before go further
1. selenium web driver(it support chromedriver to execute)
2. Panda (It act as data frame which store data temporarily from the web page)
3. Chrome web driver
All the library can get install in using npm method using syntax
npm install selenium
npm install panda
npm install chromedriver
let's move to python code
from selenium import webdriver
import pandas as pd
record=pd.DataFrame([],columns=["name"])
driver = webdriver.Chrome('G:\spyder\Web Scraping\chromedriver.exe')
driver.get('https://www.musicstore.de/en_OE/EUR')
name=driver.find_elements_by_class_name("stoererwrapper")
for nam in name:
p_name=nam.find_element_by_tag_name('span').text
temp_record=pd.DataFrame([[p_name]],columns=["name"])
record=record.append(temp_record)
print record
First two line we are importing the library
In second line we are creating table with one column as name.
In third line we are accessing chromedriver, you need to give the path of chromedriver.
In fourth line we command chromedriver to load website.
In fifth line we are fetching the content having class name storewrapper which includes name of the product.
Further records getting insert into temprory table as temp_record, later it is restored in
orginal table as record
Below is the result of said code
print record
name
0 TOPSELLER!
0 XDJ-1000 MK2
0 The perfect player for digital DJs
0 NEW!
0 LITTLE MARCUS 500
0 Marcus Miller Signature - 500W and two Equalizers
0 MEGADEAL!
0 OPERA 12
0 Active 2-way speaker with DSP presets
0 MEGADEAL!
0 FOG FURY JETT
0 Vertical output, 12 x 3W LEDs, DMX
In order to Scrape by this method we need following library
In this example we are going to scrap siteinspire.com(https://www.siteinspire.com/websites?categories=40)
Let's have fun
from bs4 import BeautifulSoup
import requests
import io
import pandas as pd
record=pd.DataFrame([], columns=['title'])
resp=request.get(https://www.siteinspire.com/websites?categories=40)
soup=BeautifulSoup(resp.text,'lxml')
try:
name = soup.find('div', {'class' : 'title'}).contents[1].strip()
records=pd.DataFrame([name],columns=['title'])
record=record.append(records)
record.csv('output.csv')
except Exception as e:
print e
That's it