Requests模块

Requests 唯一的一个非转基因的 Python HTTP 库,人类可以安全享用。

警告:非专业使用其他 HTTP 库会导致危险的副作用,包括:安全缺陷症、冗余代码症、重新发明轮子症、啃文档症、抑郁、头疼、甚至死亡

Requests 允许你发送纯天然,植物饲养的 HTTP/1.1 请求,无需手工劳动。你不需要手动为 URL 添加查询字串,也不需要对 POST 数据进行表单编码。Keep-alive 和 HTTP 连接池的功能是 100% 自动化的,一切动力都来自于根植在 Requests 内部的 urllib3

官网地址:https://2.python-requests.org//zh_CN/latest/index.html

安装

1
pip install requests

get请求

  • 不带参数
1
response = requests.get("http://www.baidu.com/")
  • 带参数
1
2
3
4
kw = {'wd':'长城'}

# params 接收一个字典或者字符串的查询参数,字典类型自动转换为url编码,不需要urlencode()
response = requests.get("http://www.baidu.com/s?", params = kw)

携带请求头

1
2
3
4
5
6
kw = {'wd':'长城'}

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

# params 接收一个字典或者字符串的查询参数,字典类型自动转换为url编码,不需要urlencode()
response = requests.get("http://www.baidu.com/s?", params = kw, headers = headers)

响应Response

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
kw = {"name":"zhangsan"}

headers={"User-Agent":"a niubility navigator"}

url = "http://httpbin.org/get"

response = requests.get(url, params=kw, headers=headers)

print("响应状态码:{}".format(response.status_code))
print("响应头:{}".format(response.headers))
print("编码:{}".format(response.encoding))
# 这里解码内容,它内部是依靠猜测的方式去解码的
print("解码内容:{}".format(response.text))
print("原始字节信息:{}".format(response.content))

print("完整的响应地址:{}".format(response.url))

打印的结果为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
响应状态码:200
响应头:{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Date': 'Thu, 25 Apr 2019 14:38:53 GMT', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Server': 'nginx', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '197', 'Connection': 'keep-alive'}
编码:None
解码内容:{
  "args": {
    "name": "zhangsan"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "a niubility navigator"
  }, 
  "origin": "111.47.249.9, 111.47.249.9", 
  "url": "https://httpbin.org/get?name=zhangsan"
}

原始字节信息:b'{\n  "args": {\n    "name": "zhangsan"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Host": "httpbin.org", \n    "User-Agent": "a niubility navigator"\n  }, \n  "origin": "111.47.249.9, 111.47.249.9", \n  "url": "https://httpbin.org/get?name=zhangsan"\n}\n'
完整的响应地址:http://httpbin.org/get?name=zhangsan

post请求

  • 不带参数
1
response = requests.post("http://www.baidu.com/")
  • 带参数python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data = "你好Python";

headers = {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Origin": "https://fanyi.qq.com",
    "Referer": "https://fanyi.qq.com/",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest",
    "Content-length": str(len(data.encode()))
}

keyword = {
    "source": "auto",
    "target": "auto",
    "sourceText": data,
    "sessionUuid": "translate_uuid"+str(int(time.time()*1000))
}

url = "https://fanyi.qq.com/api/translate"
result = requests.post(url, headers=headers, data=keyword).json()

print(result['translate']['records'][0]['targetText'])

Cookies 和 Session

Cookies

1
2
3
4
5
6
response = requests.get("http://www.baidu.com/")
# 通过cookies拿到信息
cookiejar = response.cookies
# 将CookieJar转为字典:
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
print(cookiedict)

Session

在 requests 里,session对象是一个非常常用的对象,这个对象代表一次用户会话:从客户端浏览器连接服务器开始,到客户端浏览器与服务器断开。

会话能让我们在跨请求时候保持某些参数,比如在同一个 Session 实例发出的所有请求之间保持 cookie 。

1
2
3
session = requests.session()

session.get(url)