Putting this here in case anyone finds themselves in need of something to scrape a Pipermail web archive of a Mailman mailing list. This bit of Python 3 is based on a a bit of Python 2 I found at Scraping GNU Mailman Pipermail Email List Archives. The only changes I made from the original are to update somethings to work in Python 3. It works well for my purposes, generating a single text file of the teknoids list archive from 2005 to today.
#!/usr/bin/env python
import requests
from lxml import html
import gzip
from io import BytesIO
listname = ‘teknoids’
url = ‘https://lists.teknoids.net/pipermail/’ + listname + ‘/’
response = requests.get(url)
tree = html.fromstring(response.text)
filenames = tree.xpath(‘//table/tr/td[3]/a/@href’)
def emails_from_filename(filename):
print (filename)
response = requests.get(url + filename)
if filename[-3:] == ‘.gz’:
contents = gzip.GzipFile(fileobj=BytesIO(response.content)).read()
contents = response.content
return contents
contents = [emails_from_filename(filename) for filename in filenames]
contents = b”nnnn”.join(contents)
with open(listname + ‘.txt’, ‘wb’) as filehandle:

Read the original story