Blogbody Rotating Header Image

Binary diff or rsync in Java?

Try as I might, I can’t find any decent binary diff tool written in Java. My needs are fairly simple: I want to reduce the content sent across the wire, and in most cases a diff would be 100X smaller. The continuous integration support provided by HostedQA already supports this and I have verified it is a huge savings. But currently it uses a native program called bsdiff.

Ideally I’d like to use a native Java program so that I don’t have even more infrastructure to set up in what is already a very complicated deployment (Software as a Service usually is complex, but I’m quite certain that, for better or for worse, HostedQA takes the cake).

I did find one program, javaxdelta, that does what I want. But, as I previously pointed out, SourceForge CVS is pretty much broken and I have no way to get the code. So back to the drawing board until SF gets its act together.

The bsdiff approach works well enough for now and it is very cool to see a Continuous Integration process update the server with a diff for a 30MB war file and then kick off tests. But even this approach isn’t as ideal as I’d like. One word: rsync.

Rsync is great because it doesn’t require that both the new and old files are on the local machine to diff. Rsync can send only the diffs while doing the comparisons over the wire. I don’t quite understand how it works (it may very well be magic), but that is even better. Currently, my solution will have to send the whole file if someone has updated the files on the server after the most recent upload, since the diff will no longer be valid.

If finding a Java binary diff program was hard, then finding a Java rsync program is even tougher. I did find Jarsync, which has two problems right away: 1) Hosted on SourceForge (grrr), and 2) the home page says “Note: Jarsync is no longer being actively developed. It may be picked up again, but not now.

So, does anyone know of some hidden project that Google hasn’t found yet? I imagine at some point we’ll have to write one of these two programs in Java, but in the meantime the bsdiff approach isn’t too bad for an initial release.

5 Comments on “Binary diff or rsync in Java?”

  1. #1 Software Developer
    on Dec 6th, 2006 at 3:59 am

    Hi,

    I’ve been developing a directory diff gui program and have found lots of little projects here and there for ‘diffing’. But, alas, the only way I found a useful diff was by using the netbeans IDE and they have a Diff package (diff package here). Personally I don’t want netbeans so my search continues…

    If you have any luck sourcing a decent java diff, drop me a mail too!

    Paul.

  2. #2 Software Developer
    on Dec 6th, 2006 at 6:00 am

    Hi,

    I found jrcs which contains a diff class for doing rcs type diffs in Java.

    It can be found here:

    jcrs homepage

    Paul.

  3. #3 Software Developer
    on Dec 7th, 2006 at 2:30 am

    this one’s quite promising:

    java-diff

    It basically diffs on object array (or collection) and gives you a useful output.

    For example if your diffing a text file, simply split your file into lines (using a BufferedReader) and send this to the incava diff engine. It works quite well for files with limited diffs. If a file is very different it may fall over as it doesn’t go into character level detail.

    As it works on arrays of objects, you can adapt it yourself to work with binary files.

    Paul.

  4. #4 josvazg
    on Nov 12th, 2008 at 4:07 am

    I haven’t found a a java lib for rsyncing (apart from jarsync which is probably unmaintained and untrustworthy)

    As for the rsync magic, there you have the explanation:
    http://www.samba.org/rsync/tech_report/node2.html

    Basically it splits the files into segments (of equal sizes at both sides), checksums computed are sent for one side segments and the receptor sends the segments that do NOT have the same checksums or do not exist in the origin. That is, you probably send more data than what actually changed but I guess the segment size has been selected as a proper compromise value.

    I am considering the use of the rsync tool invoking it FROM java, instead of within java.

  5. #5 anonymous
    on Dec 18th, 2008 at 11:49 am

    I know that git also does binary diffs, perhaps one can find an implementation of it in the jgit at
    http://repo.or.cz/w/egit.git

Leave a Comment