[Python] 네이버뉴스 크롤링 1 (셀레니움, BS4)

1. 웹 크롤링 환경설정

웹페이지 링크 분석을 하면 파싱(Parsing)이 수월해진다.
필요한 모듈 로딩을 위해 아래 코드 블록 입력 (셀레니움, BS4, Time)

#Part 1. 모듈 가져오기
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.service import Service
from bs4 import BeautifulSoup
import time

2. 크롬 드라이버 설정 및 웹 페이지 검색

빅데이터 검색 후 뉴스탭으로 이동

#Part 1. 모듈 가져오기
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.service import Service
from bs4 import BeautifulSoup
import time


#Part 2. 셀레니움 크롬창 제어
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')	# 크롬 최대화
driver = webdriver.Chrome('크롬드라이버 경로', options=options)
driver.get("https://www.naver.com/")
driver.find_element(By.ID,'query').click()
driver.find_element(By.ID,'query').send_keys("빅데이터" + Keys.ENTER)
driver.find_element(By.LINK_TEXT,'뉴스').click()

3. 뉴스 제목 추출

#Part 1. 모듈 가져오기
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.service import Service
from bs4 import BeautifulSoup
import time


#Part 2. 셀레니움 크롬창 제어
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')	# 크롬 최대화
driver = webdriver.Chrome('크롬드라이버 경로', options=options)
driver.get("https://www.naver.com/")
driver.find_element(By.ID,'query').click()
driver.find_element(By.ID,'query').send_keys("빅데이터" + Keys.ENTER)
driver.find_element(By.LINK_TEXT,'뉴스').click()


#Part 3. BeautifulSoup4로 소스코드 추출
html = driver.page_source # 현재 페이지의 전체 소스코드 추출
soup = BeautifulSoup(html, 'html.parser')


#Part 4. 반복문 활용 뉴스 제목 추출
contents = soup.select('a.news_tit')	# 뉴스 제목 경로['a'태그 - 'news_tit'클래스]
for index, element in enumerate(contents, 1):	# (반복문) & (enumerate 함수) 사용
    print('{} 번째 뉴스 제목: {}'.format(index, element.text))

실행하면 터미널에 아래와 같이 프린트된 것을 볼 수 있다.

📌다음 글: [Python] 네이버뉴스 크롤링 2 (셀레니움, BS4, openpyxl)

[Python] 네이버뉴스 크롤링 2 (셀레니움, BS4, openpyxl)

1. 네이버 뉴스 크롤링 환경설정 📌 이전 글 확인하기 더보기 📌1. [Python] 파이썬 개발환경 구축(for. Mac) 📌2. [Python] 파이썬 BeautifulSoup4 설치(for.Mac) 📌3. [Python] 파이썬 Selenium(셀레니움) 설치(for.M

iskaz.tistory.com

다음 글에서는 기사 제목과 링크를 텍스트파일과 엑셀파일로 저장하는 방법을 알아볼 수 있습니다.

'Python > 크롤링(스크래핑)' 카테고리의 다른 글

[Python] 대한민국 정책브리핑 보도자료 크롤링(SN, BS4, Pd) (0)	2022.12.05
[Python] 네이버뉴스 크롤링 2 (셀레니움, BS4, pandas) (0)	2022.12.03
[Python] 파이썬 Selenium(셀레니움) 설치(for.Mac) (0)	2022.10.14
[Python] 파이썬 BeautifulSoup4 설치(for.Mac) (0)	2022.10.14

1. 웹 크롤링 환경설정

2. 크롬 드라이버 설정 및 웹 페이지 검색

3. 뉴스 제목 추출

'Python > 크롤링(스크래핑)' 카테고리의 다른 글

티스토리툴바