圣者
精华
|
战斗力 鹅
|
回帖 0
注册时间 2011-4-2
|
用python抓url简单的啦 手痒瞎写了一个,要不你装个python2.7试试- # coding=utf-8
- import urllib2
- import re
- import time
- number = 33880
- print number
- while number > 32001:
- url = "http://www.xuanshu.com/" + str(number) + ".html"
- req = urllib2.Request(url)
- try:
- page = urllib2.urlopen(req)
- except urllib2.HTTPError as e:
- print url+" failed"
- except urllib2.URLError as e:
- print url+" failed"
- html = unicode(page.read(), "utf-8")
- download_pattern = re.compile('http:\/\/dzs.**.com.*?\.txt', re.MULTILINE)
- download_links = re.findall(download_pattern, html)
- for link in download_links:
- print link
- number -= 1
- #time.sleep(1) #如果太快不行的话把第一个井号去了
复制代码 |
|