AutoScrape: Robust Web Scraping in the Public Interest
Presented to Computation + Journalism 2019 and detailed in Artificial Informer
In an effort to reduce the amount of time and energy spent on developing and maintaining web scrapers for journalism projects, I developed AutoScrape, an experimental tool intended to be a test-bed for some new ideas about scraping websites. I presented these ideas in a research paper to Computation + Journalism 2019.
You can download the paper here.
Additionally, I wrote an article that explores the pitfalls of current web scraping technologies and some new methods that can be used to improve reliability of scrapers. Unlike the research paper, this article is targeted at less technologically sophisticated people who may be frustrated with their current web scrapers or are looking to reduce the amount of time they spend maintaining them.