McWalter.org :: Java Hexdump

home -> technology -> java -> hexdump

Writing a hexdump program seems to be just about the first serious program lots of us do when learning a new language. As an experienced programmer in that language, revisiting the problem can bring many of the language features (and misfeatures, often) into focus. Regardless of the language, high or low level, there's always plenty of room for optimisation.

Presented on this page is my own fairly elementary effort at hexdump. The program itself isn't very interesting - a competent java programmer should be able to throw something better together in half an hour. What is interesting, however, is the degree of improvement possible when one really tries. The first version of the program seems fine, and there's no obvious problems in the code - but see the benchmarks section below to discover how much faster it could be made.

Java `hexdump` software

v1	Jun 13th 1999	source	Initial version
v2	Feb 15th 2000	source	Cleanup, optimised
v3	Mar 20th 2003	source	Dramatic speedup - make each line a single println call rather than lots of single char prints - now nine times faster than v2.
v4	Mar 23rd 2003	source	Added printing of 8 digit offset. Changed output code to store all output in a giant StringBuffer and then print it all in one go. This makes things a little faster.
v5	Mar 28th 2003	source	Changed read code to read in a larger block at one time - a buffer of 1K seems to be the sweet spot. Also made output StringBuffer bounded (periodically flushed) so we can handle huge files (which crashed v4), and also makes us another 10% faster. Compared with v3, this version is 20% faster for small files and twice the speed for very large files. v5 is also speed compatable with od -c for small files, and twice its speed for very large files.
v5multi	Apr 9th 2003	source	This is a multithreaded version of v5, with seperate threads for reading data from the source stream and for writing out to the output stream. A queue is used to transmit data from one to the other. This wasn't successful - there wasn't any speed improvement.

Benchmarks

Benchmarks for each version of the java Hexdump program are shown, dumping both a reasonably small file, where startup time should predominate, and for a pretty large one, where only raw IO performance should be important. For comparison, the results of the same benchmarks run by the cygwin version of "od -c" are shown. This is no slight on od, which is a more general purpose program with lots of features that the java hexdump lacks, and which hasn't been so monomanically engineered to a single benchmark suite. The comparison is intended, however, to show how differing versions of the java hexdump compare with a well written, sensible C program running on a fairly efficient platform.

The two benchmarks are:

benchmark1 Dump a 500Kbyte file to a file on the same disk
benchmark2 Dump a 5Mbyte file to /dev/null

The results of running these benchmarks against each revision of java hexdump (and the reference cygwin od) are shown in the following table - times are shown in the following format minutes:seconds.tenths.


  program       benchmark1          benchmark2

  od -c         0:00.5              0:48

  hexdump v1    0:06.4              7:20
  hexdump v2    0:06.4              7:20
  hexdump v3    0:00.9              0:42
  hexdump v4    0:00.7              [failed]
  hexdump v5    0:00.5              0:21

Design notes

v4

V4 was something of a gamble - it creates a single string buffer for the entire output file in one in-memory buffer. This is great for small files, but suicidal for large ones (in my defense, the original testfile I used for benchmark 2 was a lot smaller). V4 fails on benchmark2 as it can't allocate the RAM necessary for that large file. Sure, I could run the JVM with a bigger heap, but that's only postponing the problem.

Multithreading

After v5, I produced a multithreaded version of hexdump, hoping that additional speed improvements could be realised by avoiding circumstances where the read and write paths block one another, particularly when reading from one disk and writing to another. At least on Windows XP and Linux 2.4, this wasn't the case.

Possible improvements

Here's some possible improvements, some of which I'll probably do and most of which I won't:

The code as written uses standard java.io classes for input and output. It may be possible (particularly in the multithreaded case) to rewrite the code using bulk java.nio code instead. I don't know if this will produce any actual performance improvement.
The current code doesn't handle piped input and output. Doing this shouldn't be very difficult; I wouldn't expect it to have any impact on performance.
It should be possible to elide .append() calls to oneanother (perhaps with some loop unrolling)
or just to chuck the whole thing in and use only a single char[], which we can just System.out.println(), ideally once it's got a bunch of lines encoded in it.

Conclusions

On older Java platforms, JVMs typically featured a sizeable delay between being started and actually starting to run the target program. This initialisation gap was mostly due to classloading and quickening of dozens or hundreds of individual classes needed to get even the most elementary Java environment running. This was one of the major reasons why writing unix-style single-function command line utilities in java was a problem (where each invocation would need around ten seconds to get going). Later JDK versions, particularly JDK1.4, start up very quickly.

This also shows that non-synchronised StringBuffers, and economical use of the synchronised in and out objects, can make for major improvements in I/O performance. Lastly, it shows that, with modern JVM software, there's no reason why Java software has to be slow.