关于Python中正确使用记忆化的一些注意事项

备忘录化

在计算机科学中，备忘录化是一种优化方法，通过缓存耗时函数调用的结果，并在相同输入再次出现时重用这些存储的结果，从而加速程序。

Python 的 functools.lru_cache 和 functools.cache

简单轻量级的无限制函数缓存。有时称为“备忘录化”。

返回与 lru_cache(maxsize=None) 相同的结果，为函数参数的字典查找创建一个薄包装。因为它不需要驱逐旧值，所以它比有大小限制的 lru_cache() 更小且更快。

例如：

@cache

def factorial(n):
    return n * factorial(n-1) if n else 1
>>> factorial(10)      # 没有之前缓存的结果，进行11次递归调用
3628800
>>> factorial(5)       # 只是查找缓存值的结果
120
>>> factorial(12)      # 进行两个新的递归调用，其他10个是缓存的
479001600

缓存是线程安全的，因此包装的函数可以在多个线程中使用。这意味着在并发更新期间，底层数据结构将保持一致。

最近几年我见过哪些被重复使用的地方？

避免重复加载昂贵的yaml文件

@cache
def get_glossary_data():
    return load_yaml_file("nomenclature/glossary.yml")

在本地化的 i18n 目录等内容上实现一些 Multiton 模式

@cache
def get(locale: str, *, domain: str) -> Translations:
    ... # 许多繁重的工作

    return Translations(...)

编译一次慢的 jmespath 表达式

@cache
def compile_query(expression: str) -> ParsedResult:

return jmespath.compile(expression)

从另一个域扩展对象

@cache
def load_settings(entity_id: str) -> Settings:


return SettingsFactory(entity_id).load_settings()

也许在代码源中还有其他使用方法？

需要注意的事项、应避免的事项以及需要思考的事项

🌞 当它是一个纯函数时

from lib.i18n import get_locale
@cache
def get_heavy_stuff():
    for _ in range(0, 1_000**2):
        ...
    return "funny"

没什么好说的。直接使用它 😎

🚨 当实现使用上下文内容时

例如

from lib.i18n import get_locale
@cache
def get_that_thing(text):
    locale = get_locale()  # ⛔️🚨🚔👮🏽📢
    return TheHeavyThing(text, locale=locale)

在这个例子中，locale 是一个隐藏参数。重构：

from lib.i18n import DEFAULT_LOCALE
@cache
def get_that_thing(text, locale: str = DEFAULT_LOCALE): # 👼🌴🌞
    return TheHeavyThing(text, locale=locale)

🐒 修改常量的猴子补丁

🐒🌴🥥

# lib.some.thing
MAGIC_VALUE = 42
@cache
def get_that_power():
    return MAGIC_VALUE ** 2
def am_i_strong_enough(me):

return get_that_power() < me

# tests.lib.some.thing_test

from lib.some.thing_test import get_that_power
def test_get_that_number_1(monkeypatch):

monkeypatch.setattr("lib.some.thing.MAGIC_VALUE", 1)  # 🌴🥥
    assert get_that_power() == 1
...
def test_am_i_strong_enough():
    assert am_i_strong_enough(10) is False  # 🫨😨🫨😨

尽量避免修补常量，这会给你带来麻烦。🫨

通过重构代码，你应该能够应对。

😰 使用 @cache 装饰方法

class DumbestCalculatorEver:
    @cache
    def double(self, x): return x * 2
DumbestCalculatorEver()  # 永远不会被垃圾回收
DumbestCalculatorEver()  # 永远不会被垃圾回收
DumbestCalculatorEver()  # 永远不会被垃圾回收

不要或明智地使用单例模式。因为对象将变得永生，可能导致内存泄漏。

🤔 天真地缓存实体属性

@cache
def is_settings_metasyntactical(settings: Settings) -> bool:

return settings.search("foo || bar || baz || `false`")
settings1 = Settings({"id": "xxx", "foo": True})
settings2 = Settings({"id": "xxx", "bar": False})

settings3 = Settings({"id": "xxx", "baz": True})
assert is_settings_metasyntactical(settings1) is True  # ???
assert is_settings_metasyntactical(settings2) is False  # ???
assert is_settings_metasyntactical(settings3) is True  # ???

assert hash(settings1) == hash(settings2) == hash(settings3)  # 真实情况

我们该如何处理这个问题？我认为目前没有现成的机制来妥善处理这种情况，也没有单一的处理方式。

这导致测试变得不稳定，也可能在生产环境中引发错误。

处理实体属性案例的一些想法

给定这个函数

@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
    return settings.search("foo || bar || baz || `false`")

我们期望这个函数的行为类似于一个设置属性，这将等同于：

class Settings:
    @property
    def is_metasyntactical(self):
        return self.search("foo || bar || baz || `false`")

偶然情况下，设置配置在脚本解析期间被更改的可能性很小。此外，缓存的设置在脚本/调用结束时始终会被清除。

处理这种情况有几种方法：

想法 1：始终在脚本/查询结束时清除此缓存

try:
    return resolve_the_query()
finally:
    is_settings_metasyntactical.clear_cache

想法 2：让实体注册外部属性？例如：

class Settings:
    cache = {}
def is_settings_metasyntactical(settings: Settings) -> bool:
    cache_key = is_settings_metasyntactical
    try:
        result = settings.cache[cache_key]

except KeyError:
        result = settings.cache[cache_key] = settings.search("foo || bar || baz || `false`")
    return result

因为我们总是在脚本/调用结束时清除设置，因此缓存的数据将与实体实例一起被销毁。

想法 3：使用 weakref 使用外部缓存

weakref 模块允许 Python 程序员创建对对象的弱引用。

对一个对象的弱引用不足以使该对象保持存活：当对一个引用的唯一剩余引用是弱引用时，垃圾回收可以自由地销毁该引用并将其内存重新用于其他用途。

from weakref import WeakKeyDictionary
_cache = WeakKeyDictionary()
def is_settings_metasyntactical(settings: Settings) -> bool:
    try:

result = _cache[settings]
    except KeyError:
        result = _cache[settings] = settings.search("foo || bar || baz || `false`")
    return result

我们依赖Python在垃圾回收期间清理缓存。

LeetCode 400: 第 N 位数字 — Python 解法与解析

问题 Leetcode 400 &#82

输入不可输入的内容：生成 Python .pyi 存根

在Python中，一些类动态地定义它们的

QPython+