这篇文章上次修改于 803 天前,可能其部分内容已经发生变化,如有疑问可询问作者。

requests源码阅读笔记

requests的使用方面

1

可将Response实例直接置于条件判断中,转为bool为是否为200ok

源码:

def __bool__(self):
    """Returns True if :attr:`status_code` is less than 400.

    This attribute checks if the status code of the response is between
    400 and 600 to see if there was a client error or a server error. If
    the status code, is between 200 and 400, this will return True. This
    is **not** a check to see if the response code is ``200 OK``.
    """
    return self.ok

但我还是偏向于raise_for_status()方法

2

关于hook触发时机在

  def send(self, request, **kwargs):
      """Send a given PreparedRequest.

      :rtype: requests.Response
      """
...
      hooks = request.hooks

      # Get the appropriate adapter to use
      adapter = self.get_adapter(url=request.url)

      # Start time (approximately) of the request
      start = preferred_clock()

      # Send the request
      r = adapter.send(request, **kwargs)

      # Total elapsed time of the request (approximately)
      elapsed = preferred_clock() - start
      r.elapsed = timedelta(seconds=elapsed)

      # Response manipulation hooks
      r = dispatch_hook("response", hooks, r, **kwargs)

dispatch_hook函数内

def dispatch_hook(key, hooks, hook_data, **kwargs):
    """Dispatches a hook dictionary on a given piece of data."""
    hooks = hooks or {}
    hooks = hooks.get(key)
    if hooks:
        if hasattr(hooks, "__call__"):
            hooks = [hooks]
        for hook in hooks:
            _hook_data = hook(hook_data, **kwargs)
            if _hook_data is not None:
                hook_data = _hook_data
    return hook_data

并可以给hook附带参数

hook的使用参见资料

3

添加重试次数的方法

:param max_retries: The maximum number of retries each connection
    should attempt. Note, this applies only to failed DNS lookups, socket
    connections and connection timeouts, never to requests where data has
    made it to the server. By default, Requests does not retry failed
    connections. If you need granular control over the conditions under
    which we retry a request, import urllib3's ``Retry`` class and pass
    that instead.

Usage::

  >>> import requests
  >>> s = requests.Session()
  >>> a = requests.adapters.HTTPAdapter(max_retries=3)
  >>> s.mount('http://', a)

https请求请使用s.mount('https://', a)

关于重试:每个连接应该尝试的最大重试次数。注意,这只适用于DNS查找失败、套接字连接和连接超时,永远不会适用于数据已经到达服务器的请求。默认情况下,请求不会重试失败的连接。如果你需要精确控制重试请求的条件,导入urllib3的Retry类并传递它。

还有其他参见结构篇中写到。

4

在需要的数据远小于收到数据大小的时候可这样使用:

with requests.get("https://www.baidu.com") as response:
	data = response.status_code
    ...  # get the data you want
print(data)
...  # use the data

可以确保大的数据被释放,防止后面用不到反而占用空间。

其他方面

1

from urllib3.util import parse_url
url = "http://user:pass@www.baidu.com:77/path1?arg1=1&arg2=2#frag"
scheme, auth, host, port, path, query, fragment = parse_url(url)
print(scheme, auth, host, port, path, query, fragment)
>>> http user:pass www.baidu.com 77 /path1 arg1=1&arg2=2 frag

url分为

scheme://[<user>:<password>@]host[:port] / path /query#fragment

方括号代表可选

scheme:通信协议

fragment:锚点

2

def isinstance(__obj: object,
__class_or_tuple: type | Tuple[type | Tuple[Any, …], …])
-> bool

第二个参数可以使用元组包含类型

3

一对多的字典添加元素用register这个词 删除就用deregister

def register_hook(self, event, hook):
    """Properly register a hook."""

    if event not in self.hooks:
        raise ValueError(f'Unsupported event specified, with event name "{event}"')

    if isinstance(hook, Callable):
        self.hooks[event].append(hook)
    elif hasattr(hook, "__iter__"):
        self.hooks[event].extend(h for h in hook if isinstance(h, Callable))

4

IDNA规范(Internationalized Domain Names in Applications)

用于支持其他语言例如中文,日文域名的实现提出的规范

pypi

Basic functions are simply executed:

>>> import idna
>>> idna.encode('ドメイン.テスト')
b'xn--eckwd4c7c.xn--zckzah'
>>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
ドメイン.テスト

5

上下文管理器在requests里面的使用

def __enter__(self):
    return self

def __exit__(self, *args):
    self.close()

使用例:

with Session() as session:
    rep = session.get(url)

6

__getstate____setstate__魔法方法

用于序列化

pickle.dumps(obj)序列化对象obj的属性,若该对象的类没有__getstate__ 方法, 则属性内容由__dict__ 提供,否则由__getstate__ 提供,使用__getstate__ 可自定义序列化对象的属性或其他任意内容;

注意:__getstate__ 必须返回一个字典,

pickle.loads(bytes)反序列化时,会调用__setstate__方法,该方法所需参数为序列化时__dict____setstate__提供的,前者提供为字典格式,后者提供格式具体看其返回值。

使用例:

   def __init__(self):
       self._content_consumed = False
       
def __getstate__(self):
       # Consume everything; accessing the content attribute makes
       # sure the content has been fully read.
       if not self._content_consumed:
           self.content

       return {attr: getattr(self, attr, None) for attr in self.__attrs__}

   def __setstate__(self, state):
       for name, value in state.items():
           setattr(self, name, value)

       # pickled objects do not have .raw
       setattr(self, "_content_consumed", True)
       setattr(self, "raw", None)

其中的__attrs__并不是魔法变量,作者定义的类静态变量

7

__nonzero__是在python2中判断语句中触发的,python3中被__bool__取代

python2相关文档

python3相关文档

class A:

    def __nonzero__(self):
        print("__nonzero__")
        return True

    def __bool__(self):
        print("__bool__")
        return True


if __name__ == '__main__':
    if A():
        ...

8

类内的is_xxx的表示状态的变量可以改为property属性值

可以减少一个成员变量,以防__init__函数内定义过乱

源码示例:

@property
def is_redirect(self):
    """True if this Response is a well-formed HTTP redirect that could have
    been processed automatically (by :meth:`Session.resolve_redirects`).
    """
    return "location" in self.headers and self.status_code in REDIRECT_STATI

@property
def is_permanent_redirect(self):
    """True if this Response one of the permanent versions of redirect."""
    return "location" in self.headers and self.status_code in (
        codes.moved_permanently,
        codes.permanent_redirect,
    )

9

charset_normalizerchardet

用于判断bytes对应的编码

try:
    import chardet
except ImportError:
    import charset_normalizer as chardet

text = b'\xe4\xbd\xa0\xe5\xa5\xbd'
print(chardet.detect(text)["encoding"])

>>> utf-8

10

尽量使用 var is None或者var is not None意思明确

11

setdefault配合省略关键词参数**kwargs很方便

示例:

  def send(self, request, **kwargs):
      """Send a given PreparedRequest.

      :rtype: requests.Response
      """
      # Set defaults that the hooks can utilize to ensure they always have
      # the correct parameters to reproduce the previous request.
      kwargs.setdefault("stream", self.stream)
      kwargs.setdefault("verify", self.verify)
      kwargs.setdefault("cert", self.cert)
      if "proxies" not in kwargs:
          kwargs["proxies"] = resolve_proxies(request, self.proxies, self.trust_env)
...

12

连接池概念

13

CA以及SSL证书概念

验证证书流程

证书概念:证书用于验证网站合法性,和纸质证书概念相同,电子证书也用于查看网站是否安全。

如何验证证书:

参考资料1 参考资料2

14

threading.local()

获取ThreadLocal对象

不同的线程访问这个对象使用的空间都是线程自己的空间。

15

如果一个变量为空就设置为某默认值

有以下写法:

hooks = hooks or {}

16

判断变量是否为函数

if hasattr(hooks, "__call__"):
    ...

17

从kwargs中取值:

def custom_function(**kwargs):
    foo = kwargs.pop('foo', None)
    bar = kwargs.pop('bar', None)
    ...

def custom_function2(**kwargs):
    foo = kwargs.get('foo')
    bar = kwargs.get('bar')
    ...

使用哪个取决于代码需求,没有最佳实践

18

@contextlib.contextmanager上下文修饰器

类似于类的上下文修饰器,只不过此时上下文处于一个函数中

使用例如下

import contextlib
 
@contextlib.contextmanager
def test1():
    print("前面的部分")
    yield
    print("后面的部分")
 
if __name__ == '__main__':
    with test1():
        print("中间执行的代码块")

19

frozenset()内置函数

返回一个冻结的集合,冻结后集合不能再添加或删除任何元素。

语法:

class frozenset([iterable])

为什么需要冻结的集合(即不可变的集合)呢?因为在集合的关系中,有集合的中的元素是另一个集合的情况,但是普通集合(set)本身是可变的,那么它的实例就不能放在另一个集合中(set中的元素必须是不可变类型)。

所以,frozenset提供了不可变的集合的功能,当集合不可变时,它就满足了作为集合中的元素的要求,就可以放在另一个集合中了。

20

一个简单的猜测编码

# Null bytes; no need to recreate these on each call to guess_json_utf
_null = "\x00".encode("ascii")  # encoding to ASCII for Python 3
_null2 = _null * 2
_null3 = _null * 3


def guess_json_utf(data):
    """
    :rtype: str
    """
    # JSON always starts with two ASCII characters, so detection is as
    # easy as counting the nulls and from their location and count
    # determine the encoding. Also detect a BOM, if present.
    sample = data[:4]
    if sample in (codecs.BOM_UTF32_LE, codecs.BOM_UTF32_BE):
        return "utf-32"  # BOM included
    if sample[:3] == codecs.BOM_UTF8:
        return "utf-8-sig"  # BOM included, MS style (discouraged)
    if sample[:2] in (codecs.BOM_UTF16_LE, codecs.BOM_UTF16_BE):
        return "utf-16"  # BOM included
    nullcount = sample.count(_null)
    if nullcount == 0:
        return "utf-8"
    if nullcount == 2:
        if sample[::2] == _null2:  # 1st and 3rd are null
            return "utf-16-be"
        if sample[1::2] == _null2:  # 2nd and 4th are null
            return "utf-16-le"
        # Did not detect 2 valid UTF-16 ascii-range characters
    if nullcount == 3:
        if sample[:3] == _null3:
            return "utf-32-be"
        if sample[1:] == _null3:
            return "utf-32-le"
        # Did not detect a valid UTF-32 ascii-range character
    return None