The joy of Subversion and XmlStarlet

The problem: given a set identifiers (Jira tickets, as a matter of fact) and a subversion repository, find all files that were touched by some commit, provided that such commit included at least one of those identifiers in its comment. Svnlook was not an option because the repository is not local.

After exploring some alternatives to process the usual svn log by means of unix text tools, I noticed svn has a flag “–xml”. Naturally, having the log as XML was desirable because that would remove the dependency on a certain text format.

Now the problem was of another sort: how to effectively process the potentially very large log, from the command line? Right from the start, I knew I didn’t want to write an over-bloated XML parser in Java, Scala or Ruby. And I either didn’t want to have an additional XSLT file for Xerces to do de job. I wanted to keep the dependencies and configuration to a minimum, basically because this process is to be executed from a Jenkins task, and as everybody knows, less is more.

This is when I found a marvelous tool called XmlStarlet. Its mission? Simple: to process XML without leaving the command line. In reality, the tool generates a XSLT stream on the fly, for the set of parameters it is given, and uses it to process the actual input stream. With this tool, I can find what I want for a given ticket with this line:

svn log ${URL} --xml --verbose \
 | xml select --text \
 --template \
 --match "/log/logentry" --if "contains(msg, '[#${T}]')" \
 --match "paths" --value-of "path" --nl

Where $URL is the repository I’m querying, and $T is a ticket id. I’ve put the call inside a bash script, and sort (with “-u”, clearly) the result. Voilà!

There is a pitfall with this invocation in its current form, I’m aware of that. And it is performance. If I have a set of tickets (which is my case), I’ll end up asking for the whole log as many times as tickets are in the set. I would be much nicer to include the multiple comparison in the XPATH expression, but that will be the matter of another work session.

Update:

Thanks to Soraya, the optimisation has been performed. It is now the xml processor that compares the ticket ids to what is being filtered for:

SEPARATOR=" or "
CONDITION=$(printf "${SEPARATOR}contains(msg, '[#%s]')" "${TICKETS[@]}")
CONDITION=${CONDITION:${#SEPARATOR}} # remove leading separator
svn log ${URL} --xml --verbose \
 | xml select --text \
 --template \
 --match "/log/logentry" --if "${CONDITION}" \
 --match "paths" \
 --match "path" --if "@kind != 'dir'" --value-of . --nl >> ${TMPFILE}

, , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: