Python 面试准备 01
潘忠显 / 2020-07-22
import time
import signal
run_flag = threading.Event()
run_flag.set()
def signal_handler(signal, frame):
print " get singal !!"
run_flag.clear()
if __name__ == '__main__':
signal.signal(signal.SIGINT, signal_handler)
while run_flag.is_set():
time.sleep(1)
import logging, logging.config
from logging.handlers import TimedRotatingFileHandler
def create_logger():
logger = logging.getLogger("Rotating Log_" + area + '_' + str(index))
logger.setLevel(logging.DEBUG)
handler = TimedRotatingFileHandler(logname,
when="H",
interval=3,
backupCount=24)
formatter = logging.Formatter(
'[%(asctime)s][%(process)d][%(levelname)s] %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
安装不同版本的 pip
其实就是获取不同版本的 python 对应的 get-pip:
wget https://bootstrap.pypa.io/{version}/get-pip.py
比如安装对应Python 2.6的pip:
curl --silent --show-error --retry 5 https://bootstrap.pypa.io/2.6/get-pip.py | sudo python
安装不同版本 Python 所需的模块
python2 -m pip intall redis
全局解释器锁
- GIL不是 Python 的特性
- GIL的使用是因为 CPython 内存管理不是线程安全的
- CPython 有 GIL,JPython 没有
- GIL 能先调度 IO 受限线程早于 CPU 受限线程
- 对 IO 密集型多线程产生正面效果,对于 CPU 密集型多线程,效率会由于 GIL 而大幅下降
- 很多特性、库、包已经依赖 GIL,不容易移除 GIL
The GIL was invented because CPython’s memory management is not thread-safe. With only one thread running at a time, CPython can rest assured there will never be race conditions.
The GIL can cause I/O-bound threads to be scheduled ahead of CPU-bound threads, and it prevents signals from being delivered.
参考资料
- http://www.dabeaz.com/GIL/
- https://www.jianshu.com/p/9eb586b64bdb
- https://wiki.python.org/moin/GlobalInterpreterLock
- https://python.land/python-concurrency/the-python-gil
- https://zhuanlan.zhihu.com/p/75780308
往现有类动态添加成员方法
背景:使用Protobuf,对象直接加入到 set()
中会报错“无法被hash”,因此需要给对应的类添加一个__hash__()
函数
def task_result_hash(self):
return hash(self.task_id + "#" + str(self.exec_time))
task_pb2.TaskResult.__hash__ = task_result_hash
数字转成二进制字符串
使用 format
转成字符串:
"{0:b}".format(37)
或者使用显式的转换:
bin(10)
hex(10)
遍历 Queue()
中的元素
全取出来的迭代,且会block掉,其实就是while-do的方式去调用queue.get
for job in iter(queue.get, None):
print(job)
不取出来的迭代
for job in queue.queue:
print(job)
Protobuf 相同类型无法 CopyFrom
TypeError: Parameter to MergeFrom() must be instance of same class: expected Hello_msg got Hello_msg.
可能是import了多个不同路径的pb2.py的文件
优雅的 switch
替代
def f(x):
return {
'a': 1,
'b': 2
}.get(x, 9) # 9 is default if x not found
这个链接中的好多例子存在问题,有的是非线程安全的,有的是低效的,有的会抛异常。
Python3 字节与字符串的转换
Python 3 里边 b'abc'
就是一个 bytes 类型,unicode的串就是 str 类型,他们有如下特点:
- bytes 就是 byte 的数组
- 直接用下标取出来的就是数字
- bytes 类型
decode()
得到了 str - str 类型
encode()
就可以得到bytes
>>> type ('abc')
<class 'str'>
>>> type (b'abc')
<class 'bytes'>
>>> type (b'abc'.decode())
<class 'str'>
>>> type(b'1'[0])
<class 'int'>
读文件读出来的内容是什么?
-
打开文件用的
'b'
选项读出来的就是 bytes 类型,用't'
选项读出来的就是 str 类型 -
open()
如果不指定't'
和'b'
,默认会使用t
-
如果文件中有非 ascii 码,选
't'
会默认以 utf-8 解码,所以打开一个 gbk 的文件,就会报错
另外需要注意的
- 因为 bytes 用下标取出来的值就是 int 了,因此不用使用
ord()
函数来转换 hashlib.md5()
接收的是 bytesredis
里边读出来的,是 bytes
bytes 与 str 相互转换
b"abcde".decode() # Python3 默认编码方式就是utf-8
u"abcde".encode()
非编码的 “字节数组“ (并非是bytes,而是 list of int) 转成字符串。
bytes_data = [112, 52, 52]
"".join(map(chr, bytes_data))
枚举类型
import enum
@enum.unique
class ChannelConnectivity(enum.Enum):
IDLE = (_cygrpc.ConnectivityState.idle, 'idle')
CONNECTING = (_cygrpc.ConnectivityState.connecting, 'connecting')
READY = (_cygrpc.ConnectivityState.ready, 'ready')
TRANSIENT_FAILURE = (_cygrpc.ConnectivityState.transient_failure,
'transient failure')
SHUTDOWN = (_cygrpc.ConnectivityState.shutdown, 'shutdown')
来源: GRPC源代码
Python3 废弃 iteritems()
函数
File "/data/common_detector/agent/main.py", line 25, in main
for k, v in proto_supported.PROTO_DETECTOR_MAP.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
Python 3.x 里面,iteritems()
和 viewitems()
这两个方法都已经废除了。
在 Python 3.x 用 items()
替换 iteritems()
,可以用于 for 来循环遍历。
二维的for循环产生list
[(d, p) for d in dests for p in ports]
[(x, y) for x in [1, 2] for y in [3,4]]
print "List comprehension:"
for x, y in [(x,y) for x in a for y in b]:
print x, y
https://www.oreilly.com/library/view/python-cookbook/0596001673/ch01s15.html
类变量与类函数
class MyClass:
i = 3
https://stackoverflow.com/questions/68645/are-static-class-variables-possible-in-python
https://zhuanlan.zhihu.com/p/28010894
异步检查端口连通性
https://gunhanoral.com/python/async-port-check/
Python 3 socket说明
https://docs.python.org/3/library/socket.html#socket.socket.connect
类型判断
isinstance(q, Queue)
Using type() instead of isinstance() for a typecheck.
type(q) is Queue
isinstance
和 type
的区别是:
- type 返回的是确切的类型,比如具体的子类,
- isinstance 可以判断父类型
举例,假设 NewQueue
继承自 queue.Queue
。
q = NewQueue()
print(type(q) is Queue) # False
print(isinstance(q, Queue)) # True
这个也很好理解,type()
只能返回一个值,所以只能给返回具体的类型;而 isinstance()
是传入一个候选值,可以比那里对象父类类型进行判断。
Py协程与Go协程的区别
https://segmentfault.com/a/1190000021250088
异步事件库的比较
Python 3依赖库的一些问题
ModuleNotFoundError: No module named ‘_ctypes’
安装 libffi 开发包
yum install libffi-devel -y
重新编译安装 Python
make && make install
线程没有 join()
引起的问题
RuntimeError: can’t register atexit after shutdown
主线程没有 join()
等待子线程退出。
Pip安装与删除
pip uninstall sacc_tool_pkg
自定义安装包
https://packaging.python.org/tutorials/packaging-projects/
python3 setup.py sdist bdist_wheel
官方说明文档
https://docs.python.org/zh-cn/3/tutorial/index.html
实现单例
https://www.cnblogs.com/huchong/p/8244279.html
编程规范
python3 -m pylint
google 编程规范中要求避免使用全局变量。
模块级别的常量是允许和被鼓励的,需要使用_
的前缀。
过长字符串换行:
pic_url = ("http://graph.facebook.com/{0}/"
"picture?width=100&height=100".format(me['id']))
要加过括号,要不然会当成两个语句
Pylint’s “Too few public methods” message mean: 类不能只作为数据结构储存变量,还得提供几个公共方法。
Pylint 加项目根文件夹,可以检查整个项目
Python bindings: 运行C/C++代码
https://realpython.com/python-bindings-overview/
解析 pcap 的库
import dpkt
https://github.com/kbandla/dpkt
周期性执行任务
使用 sched 模块,调度定时任务:
- 创建一个调度器对象,
sched.scheduler
,接受的两个参数,第一个返回时间,第二个进行对应时间的延迟 - 被调用的函数中,应该首先使用调度器的
enter
函数,安排下次延后事件,不然sleep的时间需要扣除函数体的运行时间 - 最后调用调度器的
run
函数,运行预定时间,等待进入下一个事件并执行,直到没有更多的计划事件
enter
函数除了可以设置延迟时间和优先级之外,可以传入函数需要的参数。
def _period_func_wrapper(func, scheduler, delay, priority, *args):
scheduler.enter(delay, priority, _period_func_wrapper, (func, scheduler, delay, priority, *args))
func(*args)
def run_process_per_minute(func, *args):
s = sched.scheduler(time.time, time.sleep)
_period_func_wrapper(func, s, 60, 1, *args)
s.run()
Pylint 字母说明:
- [R]efactor - Refactoring Suggestion
- [C]onvention - Codeing Convention
- [W]arning - Syle Warnings or other
- [E]rror - Import Error, or other detected bug
- [F]atal - Fatal Error. Processing has terminated
[x for x in L if x is not None]
https://stackoverflow.com/questions/34619790/pylint-message-logging-format-interpolation
W: 50,15: Catching too general exception Exception (broad-except) W: 56,12: return statement in finally block may swallow exception (lost-exception) W:227,15: Catching too general exception Exception (broad-except) W: 10, 0: Unused import ConfigParser (unused-import)
创建临时文件:
import tempfile
_, temp_file_path = tempfile.mkstemp()
print("File path: " + temp_file_path)
os.remove
os.rmdir
CodeCC 修复记录
Pylint
Final newline missing
Used when the last line in a file is missing a newline.tencent standards/python 1.6.3
No exception type(s) specified
Used when an except clause doesn't specify exceptions type to catch.tencent standards/python 2.4.3
try:
return read_config()['global']['appid']
except KeyError:
return ""
Module ‘sys’ has no ‘_MEIPASS’ member
Used when a variable is accessed for an unexistent member.
if getattr(sys, 'frozen', False):
base_dir = sys._MEIPASS
if getattr(sys, 'frozen', False) and hasattr(sys, '_MEIPASS'):
base_dir = sys._MEIPASS
Wildcard import sacc.core
Used when from module import * is detected.
When an import statement in the pattern of
from MODULE import *
is used it may become difficult for a Python validator to detect undefined names in the program that imported the module. Furthermore, as a general best practice, import statements should be as specific as possible and should only import what they need.
Constant name “base_dir” doesn’t conform to UPPER_CASE naming style
Used when the name doesn't match the regular expression associated to its type (constant, variable, class...). tencent standards/python 1.18.1
执行系统指令
import os
stream = os.popen('free -b -t -w')
output = stream.read()
#print(output)
```bash
python2 -m pylint --msg-template='{msg_id}:{line:3d},{column}: {obj}: {msg}' express_check.py > pylint.out
CSV 首行作为key构造字典
利用到next,取出第一次迭代输出作为key,之后继续迭代构造字典。
csv = open("titanic.txt")
keys = next(csv).strip().split(",")
print([{k: v for k, v in zip(keys, row.strip().split(","))} for row in csv])
csv.close()
https://stackoverflow.com/questions/14503973/python-global-keyword-vs-pylint-w0603
REPL
https://stackoverflow.com/a/5599313/3801587
When Python detects the “exec statement”, it will force Python to switch local storage from array to dictionary. However since “exec” is a function in Python 3.x, the compiler cannot make this distinction since the user could have done something like “exec = 123”.
无法在python3 中动态的赋值变量
# 安装 pip
wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
python get-pip.py --user
# 更新 setuptools 解决下边链接中提到的问题
# > 'install_requires' must be a string or list ...
# https://github.com/sdispater/pendulum/issues/187#issuecomment-375820769
python -m pip install setuptools -U --user
# 安装 requests
git clone git://github.com/psf/requests.git
cd requests/
python setup.py install --user
print "Zip:"
for x, y in zip(a, b):
print x, y
print "List comprehension:"
for x, y in [(x,y) for x in a for y in b]:
print x, y
python2 -m pip install –upgrade pip
pylint –disable=R,C x.py
[sum(x) for x in zip(*blocked_lines)]
https://stackoverflow.com/questions/3279560/reverse-colormap-in-matplotlib
一个不容易发现的错误
queue_name += '_QUEUE_NORMAL',
https://stackoverflow.com/questions/3279560/reverse-colormap-in-matplotlib
常用的目录操作函数
os.listdir("/usr")
pyenv
https://github.com/pyenv/pyenv
获取当前文件所在的绝对路径
获取调用者的文件名和行号
获取当前语句的行号
获取当前所在函数名
获取调用者的函数名
Code Review 的若干问题记录
- 没有必要的类的封装
从上下文来看,这个类只有一个函数、一个写死的URL,完全没有存在的必要。
-
URL/密码 硬编码写入文件
-
try 的块太大 该异常只可能是try中第一行抛出,之后的逻辑块应该挪出
-
无意义的 raise
如果存在 finally,它将指定一个‘清理’处理程序。 try 子句会被执行,包括任何 except 和 else 子句。 如果在这些子句中发生任何未处理的异常,该异常会被临时保存。 如果 finally 子句执行了 return, break 或 continue 语句,则被保存的异常会被丢弃。
try:
...
except requests.exceptions.RequestException as e
logger.error(traceback.format_exc())
raise RequestDataError('API Request failed {}'.format(str(e))
finally:
return user_label
-
没有用到的成员变量
-
函数名和变量无法正确的表达含义
def check_redis(self, redis_key, input_text)
"""判断缓存中是否有模板"""
...
def check_once(self, username, input_text):
"""判断是否要取消订阅和是否是帮助"""
...
- 编码规范
if tmp["by_relation"] in [u'业务属于', u'维护人']:
- 部分语句过于啰嗦
if xx is not None:
pass
answer = answer + data["opinion_type"]
if nick["Workspace"].get("status") is not None and nick["Workspace"]["status"] == "normal":
for i in range(len(t)):
t[i].start()
record_num, record_data, user_list = t.record_false_job_result(user_id, biz_id)
return record_num, record_data, user_list
if 'start_timestamp' not in req.keys() or 'end_timestamp' not in req.keys():
- 对后台延迟不敏感
这里循环判断获取Redis连接判断key是否存在,会增加不少响应延迟
for ccid in ccid_list:
is_exist, cash_dict = GetRedis().get_reids_key_exist(redis_key, warn_text
- 始终为 True 的分支
answer = self.no_match.get_text_answer(input_text)
ans_list = answer.split(u".")
if answer != []:
...
- 类设计不合理
基类 Record 的抽象只有更新时间,而其派生类中有大量共同的信息。
比如self.game_id = user_account.game_id if user_account else None
等4语句出现在之后6个派生类中,且__init__() 均接受一个user_account参数,这些重复代码可以通过继承一个中间类来解决。
class RecordItem(object):
def __init__(self):
self.update_time = 0
class JudgeUserCoreRecord(RecordItem):
def __init__(self, user_account=None):
super(JudgeUserCoreRecord, self).__init__()
# 核心用户信息
self.game_id = user_account.game_id if user_account else Non
self.user_id = user_account.user_id if user_account else None
self.account_type = user_account.account_type if user_account else None
self.account_info = user_account.account_info if user_account else None
- 更优雅的表达 if xx is None: xx = “xxx”
xx = xx or "xxx"
evidence_type = evidence_type or '' # 处理None值
- 类的封装缺乏层次
class CaseCoreRecord(CaseRecordItem):
def __init__(self):
super(CaseCoreRecord, self).__init__()
self.demandant_id = None
self.defendant_id = None
self.demandant_role = None
self.defendant_role = None
self.demandant_uid = None
self.defendant_uid = None
self.demandant_account_type = None
self.defendant_account_type = None
self.demandant_account_info = None
self.defendant_account_info = None
- 条件判断
return False if resp == True else True
- 如果确定
resp
只会是 Bool 型,则可以return not resp
- 如果要确切的判断
resp
是否为 True(而非其他字符串等类型),则可以return resp is not True
records = filter(lambda x: not xxx(x), records)
可以简化为
records = [for x i records if not xxx(x)]
- 计算量较大的重复语句
if 语句中使用了计算量相对较大的语句,而这个在之后是需要用的,在if条件满足之后又会计算一遍。影响性能也不利于可读性。
if old_type_patt.match(line):
old_type = int(old_type_patt.match(line).group(1).strip()
if is_free_punish_patt.match(line):
if not limit.get('user_msg') or limit['user_msg'] == 'null':
可以简化为
if limit.get('user_msg', 'null) == 'null':
md5 = evidence[evidence.find('md5'):].strip().split('&')[0].split('=')[1]
本段代码中实际是为了匹配 r".\bmd5=(xxx)[&$]." 这种情况,代码很不清晰,而且实际是会匹配到错误的字段的。
本函数中其他代码亦有此问题,滥用try
学到的: count() 方法用于统计某个元素在列表中出现的次数。
@property
將 class (類) 的方法轉換為 只能讀取的 屬性
class User:
@property
def password(self):
raise AttributeError('password is not readable attribute')
@password.setter
def password(self, password):
self.password_hash = generate_password_hash(password)
def verify_password(self, password):
return check_password_hash(self.password_hash, password)
应用场景:可以设置,不能读取
super(JudgeUserCoreInfo, self)
设计类与数据关联,
class JudgeUserSeasonRecord(RecordItem):
def __init__(self):
super(JudgeUserSeasonRecord, self).__init__()
self.season_right_punish_case_num = 0
self.season_total_case_num = 0
self.season_right_case_num = 0
self.season_wrong_case_num = 0
self.season_accuracy = 0
@staticmethod
def init_value(record):
record.season_accuracy = -1
return record
record = get_cache_record(cache_record_cls=JudgeUserSeasonRecord)
if record is None:
record = JudgeUserSeasonRecord.init_value(JudgeUserSeasonRecord())
class Cache(kv_client.KVJSONObject):
"""基于kv存储实现cache
屏蔽kv存储中value的json格式,支持将类实例作为入参,并将实例中的属性字段存入cache中
读取时,将cache中的内容作为类实例的属性进行存储
"""
_KV_PREFIX = "gs:1029:court:cache:"
_CACHE_FIELD = 'cache_info'
def __init__(self, game_id, user_id, extra_prefix=None):
"""初始化cache
:param game_id: 游戏id
:param user_id: 用户id
:param extra_prefix: 业务层指定的key前缀
"""
self.user_id = user_id
self.game_id = game_id
key_prefix = self._KV_PREFIX + get_cache_env_prefix()
if extra_prefix is not None:
key_prefix += extra_prefix
super(Cache, self).__init__("court", str(game_id) + '_' + str(user_id), key_prefix)
@property
def cache_info(self):
"""获取cache info信息"""
if not self.value or not self.value.get(self._CACHE_FIELD):
return None
return self.value[self._CACHE_FIELD]
def update_cache_info(self, cache_record, expire_sec):
"""更新cache内容
:param cache_record: 需要写入cache的记录,需要具备__dict__属性, 对于属性为None的字段不进行更新
:param expire_sec: 过期时间,单位为秒
:return: None
"""
if not self.value:
self.value = {}
if self._CACHE_FIELD not in self.value:
self.value[self._CACHE_FIELD] = {}
cache_info_dict = self.value[self._CACHE_FIELD]
for key, val in cache_record.__dict__.items():
if val is None:
continue
cache_info_dict[key] = val
self.save_value(expire=expire_sec)
def get_cache_record(self, cache_record_cls):
"""获取cache中的记录
读取cache中的内容,然后存入cache_record_cls对应的实例中
存储的字段通过cached_record_cls包含的属性进行限定,如果
cached)_record_cls中所需的字段在cache中不存在,则用None填充
:param cache_record_cls: 需要将cache中的内容写入类实例记录类型
:return: cache_record_cls对应的实例
"""
cache_info_dict = self.cache_info
if cache_info_dict is None:
return None
cache_record = cache_record_cls()
for key in cache_record.__dict__:
setattr(cache_record, key, cache_info_dict.get(key, None))
return cache_record
使用 super()
的好处
- 如果只有单一继承关系,
super()
可以使代码更可维护,比如修改的基类名的时候,只需要修改首航定义中的基类即可,而其余部分已经用super()
代替的则可以不用修改。 - 同时继承多个基类以及存在多层继承关系时,可以一次性调用所有基类的指定函数
- 当存在菱形继承的时候,能够确保每个类的指定方法只调用一次
- 在类
C
中,super()
等价于super(C, self)
,而如果要针对指定类调用其所有父类的指定函数,则可以使用super(ParentType, self)
- 如果多个基类的指定函数参数不同,会如何处理?
class A(object):
def __init__(self):
super().__init__()
print('A')
class B(A):
def __init__(self):
super().__init__()
print('B')
class C(A):
def __init__(self):
super().__init__()
print('C')
class D(C, B):
def __init__(self):
super().__init__()
print('D')
class E(D):
def __init__(self):
super(C, self).__init__()
print('E')
d = D()
print('###')
e = E()
@PiyushKansal Inside a class C, super() does the same thing as super(C, self). Python 3 introduced this shorcut that you don’t need to specify the parameters for the default cases. In most cases (around 99%) you want to use just super(). By passing a different type to it, e.g. super(ParentType, self), you would be skipping types in the MRO, so that’s probably not what you want to do.
redis-py 中的pipeline支持多条命令一次性提交,可以选择是否以事务方式进行
kv_ins = kv_client.RedisClient.get_instance(name)
# 批量操作不使用事务属性(互娱分布式redis目前不支持事务)
pipe = kv_ins.client.pipeline(False)
for incr_item in incr_list:
inc_key = incr_item['inc_key']
delta_num = incr_item['delta_num']
expire = incr_item['expire']
pipe_key = inc_key
app_log.info('incr key:%s|%s|%s', pipe_key, delta_num, expire)
pipe.incrby(name=pipe_key, amount=delta_num)
expire = None if int(expire) < 0 else expire
if expire is not None:
pipe.expire(name=pipe_key, time=expire)
pipe.execute()
参考资料:https://github.com/redis/redis-py#pipelines
使用 uuid 库生成唯一 ID
https://docs.python.org/3/library/uuid.html
uuid.uuid4() Generate a random UUID.
异常处理中抛出原来捕获的异常
except:
processing ...
raise
协程
https://www.aeracode.org/2018/02/19/python-async-simplified/
https://docs.python.org/zh-cn/3/library/asyncio-task.html
https://docs.python.org/zh-cn/3/library/asyncio-eventloop.html#creating-futures-and-tasks
简单地调用一个协程并不会使其被调度执行。比如:
async def main(): ...
main() # 不能运行
运行只能通过 asyncio.run(main())
等方式
for i in res_list:
rule = i['name']
for res in i["result"]:
metric = res["metric"]
for value in res["values"]:
timestamp = value[0]
val = value[1]
utc_date = self.pktime_to_utc(timestamp)
point_list.append(Point(activity_name) \
.tag("metric", metric) \
.field(rule, val) \
.time(utc_date, WritePrecision.S))
如何更优雅的转换上边这段代码
def __init__(self, config_file, filter_chain=[]):
...
为什么需要修改成下边这样?
def __init__(self, config_file, filter_chain=None):
# TODO validate config file format
if filter_chain is None:
filter_chain = []
...
模块和包
值传递与引用传递
As we know, in Python, “Object references are passed by value”.
listA = [0]
listB = listA
listB.append(1)
print(listA)
https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/
自定义工具包
Python Package Index (PyPI).
https://packaging.python.org/en/latest/tutorials/packaging-projects/
https://github.com/panzhongxian/markdown_image_replacer
https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html
https://www.freecodecamp.org/news/how-to-create-and-upload-your-first-python-package-to-pypi/
https://towardsdatascience.com/how-to-upload-your-python-package-to-pypi-de1b363a1b3
python3 -m twine upload –repository testpypi dist/*