본문 바로가기

개발 관련 기타/봇

How to Bypass Rate Limit for Web Scraping

1. Set User-Agent

Using fake-useragent

pip install fake-useragent
from fake_useragent import UserAgent

session = requests.Session()
headers = {
    'User-Agent': UserAgent().random
}
response = session.get(url, headers=headers)
response.raise_for_status()

 

2. Use tor network for proxy

Using tor:

 

prerequisite: install tor browser

 

pip install pysocks

 

# tor browser를 키고 python 실행
        url = "https://www.hermes.com/kr/ko/category/women/bags-and-small-leather-goods/bags-and-clutches/"
        session = requests.Session()
        session.proxies = {}
        session.proxies['http'] = 'socks5://localhost:9150'
        session.proxies['https'] = 'socks5://localhost:9150'
        headers = {
            'User-Agent': UserAgent().random
        }
        response = session.get(url, headers=headers)

 

2번은 실패

3. Use cloud pc

Using gooroom ide:

result: 403 error

4. Use VPN

Using ZoogVPN:

 

5. Use Webscraping API service

Using zenrows: