「python」学习笔记

2021-03-26

tec

python

更多 - /tags/python/

安装

Linux

安装pip
- curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
- python3 get-pip.py

基础知识点

变量

namespace（命名空间）

定义：每一个作用域变量存储的位置，或者解释为存储作用域中变量的字典

作用：获取想查看某个作用域中的变量名、变量值

方法： locals()

类的命名空间：classname().__dict__

私有变量

https://zhuanlan.zhihu.com/p/79280319

定义：至少两个前缀__，最多一个后缀_

class A:
  def __init__(self):
  	self.__b = 'c'

  # 私有方法
  def pm(self):
    return self.__b

A().__b  # 'A' object has no attribute '__b'
A()._A__b  # c
A().pm()  # c

保护变量

定义：只有一个前缀_，只有类对象和子类对象自己能访问到这些变量

内置函数

https://docs.python.org/zh-cn/3/library/functions.html

repr(obj) - 将对象转化为供解释器读取的形式。str用于一般用途，repr用于额外细节。
all(iterable) - 参数iterable为元组或列表，如果参数的所有元素不为0、""、False、None或者iterable为空，all(iterable)返回True，否则返回False；
any(iterable) -
isinstance(obj, class_or_tuple) - 如果obj的类型与class_or_tuple的类型相同则返回 True，否则返回 False
```
isinstance('str', str) # True
isinstance(111, (str, int, float)) # True
```
sorted(iterable, key=None, reverse=False)

sort(key=None, reverse=False)

# reverse
lst = [2,4,2,7,5]
lst.sort(reverse=True)
print(lst) # [7, 5, 4, 2, 2]
# key
dct = {'apple':9,'banana':2,'orange':5}
lst1 = list(dct.keys()) # ['apple', 'banana', 'orange']
lst2 = list(dct.values()) # [9, 2, 5]
lst3 = list(dct.items()) # [('apple', 9), ('banana', 2), ('orange', 5)]

lst3.sort(key=lambda i:i[0]) # 按照key值排序
lst3.sort(key=lambda i:i[1]) # 按照value值排序

print(lst3) # [('banana', 2), ('orange', 5), ('apple', 9)]

str.join(iterable) -> str
str.split(sep) -> list
str.strip() -> str
dir( obj ) - 返回包含查询对象obj的所有属性和方法名称的列表
map()
ord()

ord() 函数是 chr() 函数（对于 8 位的 ASCII 字符串）的配对函数，它以一个字符串（Unicode 字符）作为参数，返回对应的 ASCII 数值，或者 Unicode 数值。
```
ord('a')  # 97
ord('b')  # 98 good
```
exec - exec 执行储存在字符串或文件中的 Python 语句，相比于 eval，exec可以执行更复杂的 Python 代码。

类的内建函数

__init__
1. 用来构造初始化函数,用来给类的实例进行初始化属性，所以可以不需要返回值
2. 在创建类的实例时系统自动调用
3. 自定义类如果不定义的话，默认调用父类object的，同理继承也是，子类若无，调用父类，若有，调用自己的
__new__
1. __init__第一个参数是self，表示需要初始的实例，由python解释器自动传入，而这个实例就是这个__new__返回的实例
2. 然后 __init__在__new__的基础上可以完成一些其它初始化的动作

__class__

__class__功能和type()函数一样，都是查看对象所在的类。
__class__可以套用

class CONFIG:
  path = 'users/billie/desktop/Maersk'
  def __init__(self):
    self.path = self.__class__.path

__next__&__iter__ - 迭代器

__doc__

class A:
    '''我是注释文本'''

print(A.__doc__) # 我是注释文本

__dict__ - 类的命名空间

class GOT:
  	ch = "权利的游戏"

   	def __init__(self):
        self.en = "Game of Throne"
        print(self.__dict__)  # {'en': 'Game of Throne'}
        print(self.__class__.__dict__)  # {'__module__': '__main__', 'ch': '权利的游戏', '__init__': <function GOT.__init__ at 0x11046de50>, '__dict__': <att...

print(GOT().__dict__)  # {'en': 'Game of Throne'}
print(dir(GOT()))  # {'en': 'Game of Throne'}

__str__ - 类的返回信息

class GOT:
    def __str__(self):
        return "我烂尾了"

print(GOT()) # 我烂尾了

__setattr__ - 属性赋值

class BB:
    motto = ''
    def __init__(self):
      	self.__setattr__('motto', 'Now,say my name.')

class BB:
    def __setattr__(self, key, value):
      	print(key, value)

white = BB()
white.motto = 'Now,say my name.' # motto I am the danger.

__getatrr__

如果属性查找（attribute lookup）在实例以及对应的类中（通过__dict__)失败，那么会调用到类的__getattr__函数, 如果没有定义这个函数，那么抛出AttributeError异常。由此可见，__getattr__一定是作用于属性查找的最后一步，兜底。
```
class BB:
    def __init__(self):
      	self.motto = 'I am the danger.'
    def __getattr__(self, item):
      	print(f'can not find value: {item}')

white = BB()
white.motto  # I am the danger.
white.pinkman  # can not find value: pinkman
```

__repr__ - 显示属性

class BB:
    def rr(self):
        ...
print(BB())  # <__main__.BB object at 0x102a93e80>

class BB:
    def __repr__(self):
        return "You're done!"
print(BB())  # You're done!

内置装饰器

@wraps(func) - 确保被装饰函数的属性（如__name__、__doc__）不被改变
@property - 访问被装饰函数的函数名时，会返回被装饰函数的执行代码及返回值
@staticmethod - 声明某个类中的函数为静态方法，可以实现非实例化调用该方法。静态方法对类一无所知，只处理参数。

@classmethod - 类方法，当需要和类交互而不需要和实例交互时，就可以选择类方法。类方法与类一起使用，因为它的参数始终是类本身。

class Xin:
    motto = '你就好啦'

    def __init__(self, motto):
        self.motto = motto

    @classmethod
    def jie(cls, _str):  # 第一个参数是cls， 表示调用当前的类名
        motto = cls.motto + _str
        return cls(motto)  # 返回的是一个初始化后的类

    def accept(self):
        print(self.motto)


r = Xin.jie('!')
r.accept()

c = Xin.jie('。')
c.accept()

在子类中调用父类的方法

直接使用「父类名」进行调用，调用方法时需加上「self」参数

class Person:
    def __init__(self, name, age, sex):
        self.name = name
        self.age = age
        self.sex = sex

    def run(self):
        print("Winter is coming!")

    def say(self, word):
        print(word)


class Boy(Person):  # 传入父类
    def __init__(self, name, age, sex, motto):
        Person.__init__(self, name, age, sex)  # 调用父类的方法
        self.motto = motto

    def run(self):
        Person.run(self)  # 调用父类的方法

    def say(self):
        Person.say(self, self.motto + "," + self.name)  # 调用父类的方法


Jon = Boy('Jon Snow', 24, 'man', 'you know nothing')
Jon.run()  # Winter is coming!
Jon.say()  # you know nothing,Jon Snow

super()

优点：

调用方法时无需加上「self」参数
当父类名有改动时，子类中只需修改「传入的父类名」即可

class Person:
    def __init__(self, name, age, sex):
        self.name = name
        self.age = age
        self.sex = sex

    def run(self):
        print("Winter is coming!")

    def say(self, word):
        print(word)


class Boy(Person):  # 传入父类
    def __init__(self, name, age, sex, motto):
        super().__init__(name, age, sex)  # 调用父类的init
        self.motto = motto

    def run(self):
        super().run()  # 调用父类的方法

    def say(self):
        super().say(self.motto + "," + self.name)  # 调用父类的方法


Jon = Boy('Jon Snow', 24, 'man', 'you know nothing')
Jon.run()  # Winter is coming!
Jon.say()  # you know nothing,Jon Snow

class ShangBao:
    def __init__(self):
        self.url = 'http://ga.fz.cn'

    def get_engine(self):
        return f'sdf'


class DaoChu(ShangBao):
    def __init__(self):
        super(self).__init__()
        self.url = super().url

    def engine(self):
        return self.get_engine()


DaoChu().engine()  # sdf

那些年遇过的坑儿们

RuntimeError: dictionary changed size during iteration
- 出错描述：RuntimeError: dictionary changed size during iteration
- 原因：遍历时不能修改字典元素
- 解决：将遍历条件改为列表
TypeError: exceptions must derive from BaseException
- 原因：异常必须从基础异常类中派生

骚操作

时间相关

七天前的日期

import datetime
(datetime.datetime.now()-datetime.timedelta(days=7)).strftime('%Y-%m-%d')

格式化日期转为时间戳

import time
time.mktime(time.strptime('2021-02-11', '%Y-%m-%d'))

时间戳转为格式化日期

time.strftime('%Y-%m-%d', time.localtime( time.time() ))

计算n天前后的日期

def delta_time(mode:str, date:str, day:int) -> str:
  '''
  计算某个日期 n 天前的日期
  :param mode: ADD or SUB （加或减）
  :param day: 间隔日，如 1
  :param date: 格式化日期，如 2021-08-10
  return : 格式化的日期
  '''
  format = '%Y-%m-%d'
  struct_time = time.strptime(date, format)
  timestamp = time.mktime(struct_time)

  if mode == 'ADD':
  	timestamp += 86400*day
  elif mode == 'SUB':
    timestamp -= 86400*day

  struct_time = time.localtime(timestamp)
  date = time.strftime(format, struct_time)
  return date

获取时间段内的日期

def get_date_list(start_date, end_date, format) -> list:
    """
    根据开始日期、结束日期返回这段时间里所有天的集合
    :param start_date: 开始日期(日期格式或者字符串格式)
    :param end_date: 结束日期(日期格式或者字符串格式)
    :param format: 格式化字符串, 如: '%Y-%m-%d'
    :return:
    """
    date_list = []
    if isinstance(start_date, str) and isinstance(end_date, str):
        start_date = datetime.datetime.strptime(start_date, '%Y-%m-%d')
        end_date = datetime.datetime.strptime(end_date, '%Y-%m-%d')
    date_list.append(start_date.strftime(format))
    while start_date < end_date:
        start_date += datetime.timedelta(days=1)
        date_list.append(start_date.strftime(format))
    return date_list

获取时间段内的某一天日期

def split_dates(start_date_str: str, end_date_str: str):
    """
    以月为周期，分割时间段，取每月的一号如 2021-12-01
    :params start_date: 例 2021-10，2021-9
    :params end_date: 例 2021-10，2021-9
    :return date_tuple: 例 [('2021-09-01', '2021-10-01'), ('2021-10-01', '2021-11-01')]
    """
    _format = '%Y-%m-%d'
    start_date_str = start_date_str + '-01' if not start_date_str.endswith('-01') else start_date_str
    end_date_str = end_date_str + '-01' if not end_date_str.endswith('-01') else end_date_str
    start_date = datetime.datetime.strptime(start_date_str, _format)
    end_date = datetime.datetime.strptime(end_date_str, _format)

    dates = list()
    dates_tuple = list()
    dates.append(start_date.strftime(_format))

    # 获取区间内所有的日期
    while start_date <= end_date:
        start_date += datetime.timedelta(days=1)
        dates.append(start_date.strftime(_format))

    # 获取区间内”日为1“的日期
    dates = [date for date in dates if date.endswith('-01')]

    # 分组
    for i in range(dates.__len__() - 1):
        dates_tuple.append((dates[i], dates[i + 1]))

    log.info(dates)
    log.info(dates_tuple)
    return dates_tuple

获取本月的天数

import calendar
import datetime

now = datetime.date.today()
day_num = calendar.monthrange(now.year, now.month)[1]

获取本月最后一天的日期

import calendar
import datetime

now = datetime.date.today()
day_num = calendar.monthrange(now.year, now.month)[1]  # 天数
end_date = datetime.datetime(now.year, now.month, day_num).strftime("%Y-%m-%d")

字典排序

按key
- methods：sorted(dict)
  - return：包含键名的list（键名已排序）
按value
- method：sorted(dict.items(), key = lambda kv:(kv[1], kv[0]))
  - sorted的两个参数，前者是包含元组的数组，后者是处理的方法（一个函数）
  - return：包含元组的数组（已排序）

列表去重

set(), list()
```
ls = []
list( set(ls) )
```

dict(), list()

ls = []
list( dict.fromkeys(ls) )
ls.sort(key=ls.index)  # 加上列表中索引（index）的方法保证去重后的顺序不变

列表中的字典去重

reduce

from functools import reduce
data_list = [{'a':"123"}, {'a':"123"}, {'a':"123"}]
func = lambda x, y:x if y in x else x + [y]
a = reduce(func, [[],] + data_list)

字典相加

a = {}
b = {}
c = {**a, **b}

如果有重复的key，则value的值取{**a, **b}中的后者，即b字典中的值。

快速将请求头转为json格式

feapder create -j

列表/字典表达式

# 单层 列表
[data for data in datas]
# 双层 列表
[data for datas in datas_list for data in datas]
# 单层 字典
{data[0]: {'基金号': data[0], '基金名称': data[1], '历史净值': []} for data in datas}

带颜色的字符串输出

‘\033[3开头的是字体颜色 ; 1m比0m更粗更亮’
‘\033[4开头的是背景颜色 ; '

删除文件、文件夹

os.remove(path) 删除文件
os.removedirs(path) 删除空文件夹
os.rmdir(path) 删除空文件夹
shutil.rmtree(path) 递归删除文件夹

异常

https://www.cnblogs.com/mingmingming/p/11254596.html

import traceback
try:
    a = input('请输入一个数字：')
    if not a.isdigit():
        raise ValueError(f'输入异常({a}非数字)')
except ValueError as e:
    print('程序出错，报错提示: ', repr(e), '\n', traceback.format_exc())

生成二维码

安装：pip install myqr
使用：myqr 网址
参数：
- v(尺寸)
- n(命名生成的二维码)
- d(生成二维码保存位置)
- p(传入图片地址)
- c(彩色,留空)
- con(对比度，例1.0)
- bri(亮度，例1.0)

将两个列表对应成为字典

dict(zip(list1,list2))

解析特殊的网页元素

import html
html.unescape('&#39575;')

&name;
&#dddd;
&#xhhhh;

——的一串字符是 HTML、XML 等 SGML 类语言的转义序列（escape sequence）。它们不是「编码」。

以 HTML 为例，这三种转义序列都称作 character reference：

第一种是 character entity reference，后接预先定义的 entity 名称，而 entity 声明了自身指代的字符。
后两种是 numeric character reference（NCR），数字取值为目标字符的 Unicode code point；以「&#」开头的后接十进制数字，以「&#x」开头的后接十六进制数字。

从 HTML 4 开始，NCR 以 Unicode 为准，与文档编码无关。

「中国」二字分别是 Unicode 字符 U+4E2D 和 U+56FD，十六进制表示的 code point 数值「4E2D」和「56FD」就是十进制的「20013」和「22269」。所以——

&#x4e2d;&#x56fd;
&#20013;&#22269;

——这两种 NCR 写法都会在显示时转换为「中国」二字。

NCR 可以用于转义任何 Unicode 字符，而 character entity reference 很受限，参见 HTML 4 和 HTML5 中已有定义的字符列表：

https://www.zhihu.com/question/21390312

https://www.cnblogs.com/liuhaidon/p/12060184.html

获取当前登录系统的用户名

import getpass
getpass.getuser()

PDF to PNG

import fitz
pdf = fitz.open(pdf_path)
page = pdf[0]
trans = fitz.Matrix(2,2).preRotate(0)
pm = page.getPixmap(matrix=trans, alpha=False)
pm.writePNG(img_path)
pdf.close()

图片拼接

from PIL import Image

def image_compose(img_paths:list):
  '''
  纵向拼接图片     
  :param img_paths: 所有图片的路径
  :param img_save_path: 拼接图片的保存路径
  '''
  img_count = img_paths.__len__()
  img_size = Image.open(img_paths[0]).size
  img_lenth = img_size[0]
  img_width = img_size[1] * img_count
  joint = Image.new('RGB', (img_lenth,img_width))
  
  for ind, img_path in enumerate(img_paths):
    img = Image.open(img_path)
    loc = (0, img_size[1] * ind)
    joint.paste(img, loc)
    
  img_save_path = os.path.join()
  joint.save(img_save_path)
  return img_save_path

压缩图片大小

from PIL import Image
import os

def get_outfile(infile, outfile):
    if outfile:
        return outfile
    dir, suffix = os.path.splitext(infile)
    outfile = '{}-out{}'.format(dir, suffix)
    return outfile
  
def image_compress(infile, outfile='', mb=800, step=10, quality=80):
    """
    不改变图片尺寸压缩到指定大小
    :param infile: 压缩源文件
    :param outfile: 压缩文件保存地址
    :param mb: 压缩目标，KB
    :param step: 每次调整的压缩比率
    :param quality: 初始压缩比率
    :return: 压缩文件地址，压缩文件大小
    """
    o_size = os.path.getsize(infile) / 1024
    if o_size <= mb:
        return infile
    else:
        outfile = get_outfile(infile, outfile)
        while o_size > mb:
            im = Image.open(infile)
            im.save(outfile, quality=quality)
            if quality - step < 0:
                break
            quality -= step
            o_size = os.path.getsize(outfile)
        return outfile, get_size(outfile)

解压压缩包

import zipfile
import os

def un_zip(file_name):  
    """unzip zip file"""  
    zip_file = zipfile.ZipFile(file_name)
    '''
    if os.path.isdir(file_name.split(".")[0]):  
        pass  
    else:  
        os.mkdir(file_name.split(".")[0])
    '''
    for names in zip_file.namelist():  
        zip_file.extract(names)  #加入到某个文件夹中 zip_file.extract(names,file_name.split(".")[0])
    zip_file.close()

un_zip("test.zip")

获取python执行文件的路径

import sys
sys.path  # 
sys.executable  #

== is

== 是比较两个对象的内容是否相等，即两个对象的“值“”是否相等，不管两者在内存中的引用地址是否一样
is 比较的是两个实例对象是不是完全相同，它们是不是同一个对象，占用的内存地址是否相同。即is比较两个条件：1.内容相同。2.内存中地址相同
https://blog.csdn.net/qq_26442553/article/details/82195061

中文转换url编码

import urllib.parse
print(urllib.parse.quote())
print(urllib.parse.unquote())

token

https://blog.csdn.net/weixin_30394633/article/details/95011702

traslate方法

intab = "abcde"
outtab = "12345"
trantab = str.maketrans(intab, outtab)   # 制作翻译表

str = "this is string example....wow!!!"
print(str.translate(trantab))

cmd命令中的路径存在空格，导致命令出错

chrome_path = "C:\Program Files\Google\Chrome\Application\chrome.exe"
user_data_dir = "C:\Users\police\AppData\Local\Google\Chrome\User Data"
os.system(rf'start "" "{chrome_path}" --remote-debugging-port=9222 --user-data-dir="{user_data_dir}"')

措施一

在路径前后加上双引号，如"user_data_dir"
措施二

在路径前后加上双引号，以及前面再加上一个双引号，如 start "" "{chrome_path}"

selenium捕获不到调试浏览器

可能原因：chromedriver.exe进程冲突，检查任务管理器中是否有多个chromedriver.exe的进程，全部停止任务，重试即可捕获成功

options = Options()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(options=options)

shadowDOM节点操作

https://www.jianshu.com/p/4d00259f6441

有趣的开源项目

炫酷的命令行系统监控工具

安装：pip install tiptop
运行：tiptop

口袋妖怪类游戏

pip install scrap_engine
git clone https://github.com/lxgr-linux/pokete.git
./pokete/pokete.py

KFCError

class KFCError(BaseException):
    def __str__(self):
        return 'KFC Crazy Thursday v me ¥50.'


raise KFCError

unicode编码转换

t = r"\u6211\u672c\u662f\u663e\u8d6b\u4e16\u5bb6\u7684\u5965\u7279\u66fc\uff0c\u5374\u88ab\u8be1\u8ba1\u591a\u7aef\u7684\u602a\u517d\u6240\u5bb3\uff01\u5965\u7279\u66fc\u5bb6\u65cf\u5f03\u6211\uff01\u5965\u7279\u4e4b\u7236\u9010\u6211\uff01\u751a\u81f3\u65ad\u6211\u4f3d\u9a6c\u5c04\u7ebf\uff01\u91cd\u751f\u4e00\u4e16\uff0c\u4eca\u5929\u80af\u5fb7\u57fa\u75af\u72c2\u661f\u671f\u56db\uff01\u8c01\u8bf7\u6211\u5403\uff1f"
t.encode("utf-8").decode("unicode_escape")

取列表交集

list(set([1,2,112,11])&set([11,1,233]))
# [1, 11]

导入隔壁文件夹下的模块

如在file4.py中想引入import在dir3目录下的file3.py。

这其实是前面两个操作的组合，其思路本质上是将上级目录加到sys.path里，再按照对下级目录模块的方式导入。

同样需要被引文件夹也就是dir3下有空的__init__.py文件。

-- dir
　　| file1.py
　　| file2.py
　　| dir3
　　　| __init__.py
　　　| file3.py
　　| dir4
　　　| file4.py

同时也要将上级目录加到sys.path里：

import sys
sys.path.append("..")
from dir3 import file3

常见错误及import原理，在使用直接从上级目录引入模块的操作时：

from .. import xxx

经常会报错:

ValueError: attempted relative import beyond top-level package

这是由于相对导入时，文件夹实质上充当的是package，也就是包的角色（比如我们常用的numpy、pandas都是包）。如果python解释器没有认同该文件夹是package，那么这就是一个普通的文件夹，无法实现相对导入。

文件夹作为package需要满足如下两个条件：

文件夹中必须存在有__init__.py文件，可以为空。
不能作为顶层模块来执行该文件夹中的py文件。

更多 - /tags/python/