Technology practitioners in the Internet space are constantly faced with difficult decision-making as to what research has been done and what the best practices are. This lucid text covers the emerging technologies of document retrieval, information extraction, and text categorization.