Skip to content
Scan a barcode
Scan
Paperback Data Crunching: Solve Everyday Problems Using Java, Python, and More Book

ISBN: 0974514071

ISBN13: 9780974514079

Data Crunching: Solve Everyday Problems Using Java, Python, and More

Every day, all around the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check that configuration files are internally consistent, and... This description may be from another edition of this product.

Recommended

Format: Paperback

Temporarily Unavailable

1 person is interested in this title.

We receive 5 copies every 6 months.

Customer Reviews

5 ratings

Short, Informative, Useful and Clear

Some of the best technical books are short, clear, easy to understand, and practical. Greg's book falls into this description. This a great book for exploring algorithms in the python language. The book assumes the reader has at least a basic understanding of the python programming language or some programming experience. I was delighted that topics were presented in a concise and unambigous way and that the book was short. There should be more short books published!

It's about using the right tool for the right job

Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools. In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process. From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt. After presenting the Java equivalent of a Python script that counts the number of times every email address appears in a list of email addresses, he writes: All right. It's two-and-a-half times longer than the equivalent Python program, it isn't as fast on small files, and we have to compile it before we can run it, but other than that, it's almost as easy... With a table of useful commands, explanation of redirection and piping, and some guidelines on how to make sure that your command-line tools follow convention, the text chapter could actually be viewed as a pretty passable introduction to the philosophy of Unix. The chapter on Regular Expressions is great. So good, in fact, that I wish I could go back in time and give myself a photocopy of those thirty-odd pages at the point that I was struggling to get a handle on RE's some years back. Also included in this chapter is a brief, but very lucid, discussion of character encoding and a bit on using grep. Although the Text and RE chapters were my favorite, Wilson's clear and concise writing style makes th eentire book, including the coverage of XML, binary data processing, and relational databases, a joy to read. With segues like "But wait a second. Wait just one pattern-matching second.", lists of email addresses to munge that include entries for Alan Turning, John vo

Just the information you need to know to get rolling

There exists a set of tasks common to every software developer independent of the type of application developed and the language used. Concisely presenting these tasks to the new developer has always been a problem without burying the hapless soul under a pile of thick texts. The Pragmatic Bookshelf attempts to remedy this situation by giving the developer the knowledge they need to get the job done in a concise and, well, pragmatic format. One of the latest offerings in this outstanding series is "Data Crunching: Solving Everyday Problems Using Java, Python, and More" by Greg Wilson. The core of programming comes down to data manipulation. This may be parsing XML, reformatting text data, searching a database, or any other number of a host of tasks. Typically, figuring out how to do each of these would require digesting several books in order to just get to the nuts and bolts of simple operations. "Data Crunching" fills this hole by concisely presenting the minimum amount of information required to get the job done. Just the information you need to know to get rolling, without all the fluff. There are chapters on manipulating text files, XML documents, binary data, and relational databases. Included is a nice chapter on regular expressions, as well as a chapter on various "glue" topics relevant to solving data manipulation problems. Each chapter examines the tools and methods used to successfully manipulate the format of data being discussed. The examples used, and the book is chock full of them, are practical and relevant to the problems most often faced by developers. The examples are clearly illustrated and easy to follow. Wilson does a fine job of presenting things in the "pragmatic" style that readers familiar with other books in the series have come to know. Each chapter stands well on its own, so the book may be used as a reference, although it's concise and a pleasant enough read that it's also worth reading through once. Great for the new developer who hasn't yet gotten his feet wet with data manipulation, yet also a nice reference for those who have been around the block a bit more, "Data Crunching" makes a fine addition to the Pragmatic series and is definitely worth having on the bookshelf.

A gold mine for the software developer...

If you're reading this, you probably spend some quality time developing software. If you're developing software, chances are that you have to move data around on a daily basis (lucky you, if you don't). Be it getting data from one text format to another, moving data from a legacy system to a newer project's database, transforming XML into some more readable format for your boss or trying to get some useful data out of a former colleague's own binary format. Whatever you do in that manner, you're crunching data. Greg Wilson seems to have spent a lot of time crunching data and wants to share his wisdom with the world of pragmatic programmers. The book's coding focus is on working with Python and Java. I for my part haven't worked with Python yet, but being familiar with Ruby and Groovy it wasn't actually that hard to get an idea about what the Python code does (and I'm starting to like Python). So you've been warned about that. Being a big fan of The Pragmatic Programmers' bookshelf I didn't hesitate to buy a copy of "Data Crunching" as well. Since I spend a lot of time doing stuff with some more or less usable data I thought it might be a good read to get some fresh ideas. And as it turns out that was a good choice. Let's dive into the world of crunching data. Greg takes it easy on the reader in the introduction. He starts off with short examples of his professional career. This helps a lot to get an idea about what data crunching actually is. If you didn't already know it, reading the first chapter will give you some hints. The book is split up in a simple way. The next chapters will take you most of the data source/formats/crunching you'll most likely get in touch with. Mainly, that's text, regular expressions, XML, binary data, and relational databases. The book ends with a short chapter about the so called horseshoe nails, that being things that didn't fit anywhere else. But we'll get to that later. Not surprisingly, every chapter ends with a short summary. The (more or less) simplest data you can work with is text. While some genius programmer in some company whose products you use/once used can always come up with a great new text format that nobody will ever understand, there's a good chance that you'll at least get an idea of its meaning by looking at a text file. Greg takes an example from the introduction some steps further to show the basics of working with text files, and also how to work with and around the common pitfalls. Being a pragmatic book, you also get the idea of how to keep your data crunching code nice and clean, and how to deal with normalising, collision detection and, of course, the basics of working with the UNIX shell (the tool of my choice for dealing with most "normal" text). After reading this chapter you have a very good idea about dealing with text. Compressing more information about dealing with text should be almost impossible. Ah, regular expressions. The sheer joy of getting to know all the differences

Just what the newbie or occasional programmer needs

Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed. The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing. The author of course covers unit testing but also simple ways of testing when full-blown unit testing is overkill. The last chapter also has sections on encoding, dealing with floating point numbers, dates and times and how to format them with strftime. I was impressed by the author's ability to cull such important techniques and idioms and organize them into a small, yet incredibly useful text. Data Crunching covers real-life data parsing and manipulation concepts. It does so without tangential journeys into other areas of programming. Each of the five main topics include simple code examples, usually in Python, Java or both, that clearly demonstrate the topic. The author does an impressive job of squeezing in most all the issues in the daily work of data crunching. The reader can expect to come away with something of value on each topic covered, especially the newbie or occasional script writer.
Copyright © 2025 Thriftbooks.com Terms of Use | Privacy Policy | Do Not Sell/Share My Personal Information | Cookie Policy | Cookie Preferences | Accessibility Statement
ThriftBooks ® and the ThriftBooks ® logo are registered trademarks of Thrift Books Global, LLC
GoDaddy Verified and Secured