Calculating the base 32 encoded SHA-1 digest that is commonly used in WARC files and CDX indexes.
sha1_digest(b'12345')
Making URLs Pretty
Sometimes I want to return something that looks like a URL in Jupyter, but works in other environments. Adapted from here.
It displays nicely
url = URL('https://commoncrawl.org/')
url
The repr is usable
repr(url)
The string form is what we need
str(url)
Or we can extract it
url.url
Make a session that can run multiple concurrent requests and retry for intermittent failures.
Forcing a function with joblib.Memory
def _forced(f, force):
"""Forced version of memoized function with Memory"""
assert hasattr(f, 'call')
if not force:
return f
def result(*args, **kwargs):
# Force returns a tuple of result,metadata
return f.call(*args, **kwargs)[0]
return result