A workflow for refining web pages into useful datasets.