Abstract: Web scraping is a foundational task in journalism and tends to be performed using custom, one-off tools. Traditional methods involve constructing HTTP requests and extracting data using XPath. As web sites become more interactive, these methods require an increasing amount of manual ef- fort to develop and maintain. This paper builds on previous work in text-based extraction techniques, adapts them to navigating a real browser, and proposes using Hext, a novel domain-specific language for extracting structured data from HTML. We introduce AutoScrape, an investigative-focused web scraping tool which implements this framework. Auto- Scrape can simplify many common journalistic data gather- ing tasks and reduce maintenance costs. In partnership with several non-profit media organizations, this paper will also present case studies describing common investigative tasks and illustrate the use of this framework to successfully solve each problem.