python, urllib2 weird error?
friends.
I'm trying to rewrite one of my littles tool. basically, it gets an input
from user, and if that input doesn't contain the "base url" a function
will construct that input into a valid url for other part of the program
to work on.
if I were wrote it so the program only accepts valid url as input, it will
work; however if I pass a string and construct it, urllib2.urlopen() will
fail, and I have no idea why, as the value return is exactly the same str
value...
import urllib2
import re
class XunLeiKuaiChuan:
kuaichuanBaseAddress = 'http://kuaichuan.xunlei.com/d/'
regexQuery = 'file_name=\"(.*?)\"\sfile_url=\"(.*?)\sfile_size=\"(.*?)\"'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)'
def buildLink(self, aLink):
if aLink == '':
return
if 'xunlei.com' not in aLink:
aLink = self.kuaichuanBaseAddress + aLink
return aLink
def decodeLink(self, url):
url = self.buildLink(url) #it will return correct url with the
value provided.
print 'in decodeLink ' + url
urlReq = urllib2.Request(url)
urlReq.add_header('User-agent', self.agent)
pageContent = urllib2.urlopen(urlReq).read()
realLinks = re.findall(self.regexQuery, pageContent)
return realLinks
test = XunLeiKuaiChuan()
link='y7L1AwKuOwDeCClS528'
link2 = 'http://kuai.xunlei.com/d/y7L1AwKuOwDeCClS528'
s = test.decodeLink(link2)
print s
when I call it with link2 it will function as expected. and will fail when
use 'link' someone tell me what I miss here? my "old version" work with
only accept full url, but this unknown behavior is killing me
here......Thank you.
btw if with full url it returns an empty list, just open the url and enter
the catcha on the page. they do it to prevent some kind of 'attacks'....
No comments:
Post a Comment