Friday, April 10, 2009

Maven2 plugin to generate shell scripts

Here is a silly, but potentially useful little Maven2 plugin that I wrote recently. It uses the application module's POM to build a bash script that runs a Java "executable" program (ie a class with a main() method).

For a given module, a bash script to run one class versus another is almost identical - the only difference is in the class name being passed to java. The rest of it is all boilerplate, and the largest portion is the CLASSPATH declaration. Since Java modules typically have tons of dependencies, it can be quite tedious to build the bash script by hand - in fact, it would probably qualify as my least favorite (coding) activity.

Why do it then, you ask? Well, I typically run small datasets (upto a couple 1000) through my program by writing a JUnit test and calling it using "mvn test", but I find that the JVM runs out of memory for large data sets, especially with a lot of logging. I suspect that this is because Maven2 buffers the logs in memory to write the results of the test in an XML file, but I could be wrong. The other reason is that if your program is going to production, then a shell script to run the code is one of the deliverables.

Plugin setup

The plugin is built inside an existing plugin project that I described in a previous post. For this plugin, I added in dependencies to JDOM and Velocity. The relevant snippet from my pom.xml is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
<project>
  ...
  <dependencies>
  ...
    <dependency>
  ...
    <dependency>
      <groupId>velocity</groupId>
      <artifactId>velocity</artifactId>
      <version>1.4</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>jdom</groupId>
      <artifactId>jdom</artifactId>
      <version>1.0</version>
      <scope>compile</scope>
    </dependency>
  </dependencies>
  ...
</project>

Plugin code

Here is the code for the plugin. It is set to be called in the "deploy" phase, so it does not interfere with the normal development (clean compile test-compile test) life cycle.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
// Source: src/main/java/com/mycompany/plugin/BashScriptMojo.java
package com.mycompany.plugin;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.Date;
import java.util.List;
import java.util.Properties;

import org.apache.maven.plugin.AbstractMojo;
import org.apache.maven.plugin.MojoExecutionException;
import org.apache.velocity.VelocityContext;
import org.apache.velocity.app.Velocity;
import org.codehaus.plexus.util.StringUtils;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.Namespace;
import org.jdom.input.SAXBuilder;

/**
 * Builds a script to execute a given class in a project.
 * @goal script
 * @phase deploy
 */
public class BashScriptMojo extends AbstractMojo {
  
  /**
   * Location of the file.
   * @parameter expression="${project.build.directory}"
   * @required
   * @readonly
   */
  private File outputDir;
  
  /**
   * The project directory where the pom.xml file is located.
   * @parameter expression="${basedir}"
   * @required
   * @readonly
   */
  private File projectDir;
  
  /**
   * Full class name to build execution script for.
   * @parameter expression="${className}"
   * @required
   */
  private String className;

  @SuppressWarnings("unchecked")
  public void execute() throws MojoExecutionException {
    try {
      // parse the pom.xml to find the list of dependency and expand
      // them out to the correct path in the M2 repository to build
      // the classpath
      SAXBuilder parser = new SAXBuilder();
      Document doc = parser.build(new File(projectDir, "pom.xml"));
      Element root = doc.getRootElement();
      Namespace defaultNamespace = root.getNamespace();
      Element dependenciesElement = 
        root.getChild("dependencies", defaultNamespace);
      StringBuilder buf = new StringBuilder();
      List<Element> dependencyElements =
        dependenciesElement.getChildren("dependency", defaultNamespace);
      for (Element dependencyElement : dependencyElements) {
        String groupId = dependencyElement.getChildTextTrim(
          "groupId", defaultNamespace);
        String artifactId = dependencyElement.getChildTextTrim(
          "artifactId", defaultNamespace);
        String version = dependencyElement.getChildTextTrim(
          "version", defaultNamespace);
        String path = StringUtils.join(new String[] {
          "$M2_REPO",
          StringUtils.replace(groupId, ".", File.separator),
          artifactId,
          version,
          StringUtils.join(new String[] {artifactId, version}, "-") + ".jar"
        }, File.separator);
        buf.append(path).append(File.pathSeparator).append("\\\n");
      }
      // finally append the target/classes dir
      buf.append(projectDir.getAbsolutePath()).append("/target/classes");
      // calculate the class name only for script file and log file
      String shortClassName = 
        className.substring(className.lastIndexOf('.') + 1);
      // stick them into the context
      VelocityContext context = new VelocityContext();
      context.put("__classpath__", buf.toString());
      context.put("__date__", new Date());
      context.put("__classname__", className);
      context.put("__logfile__", shortClassName + ".log");
      // we want to load the .vm file from the classpath, so we configure
      // the ClassPathResourceLoader to get the vm file.
      Properties props = new Properties();
      props.setProperty("resource.loader", "classpath");
      props.setProperty(
        "classpath.resource.loader.class", 
              "org.apache.velocity.runtime.resource.loader.ClasspathResourceLoader");
      Velocity.init(props);
      File scriptFile = new File(outputDir, "run" + shortClassName + ".sh");
      BufferedWriter writer = new BufferedWriter(new FileWriter(scriptFile));
      Velocity.mergeTemplate("bash_script.vm", "UTF-8", context, writer);
      writer.flush();
      writer.close();
      getLog().info("Script " + scriptFile.getName() + 
        " written to " + scriptFile.getPath());
    } catch (Exception e) {
      getLog().error("Error executing BashScriptMojo", e);
      e.printStackTrace();
      throw new MojoExecutionException(e.getMessage(), e);
    }
  }
}

Finally, here is the Velocity template file. As you can see above, I had to use Velocity's ClassPathResourceLoader to load it from the classpath (src/main/resources) of the plugin project.

1
2
3
4
5
#!/bin/bash
# Generated by mvn mycompany:script on ${__date__}
M2_REPO=$HOME/.m2/repository
CLASSPATH=${__classpath__}
java -cp $CLASSPATH -Xmx2048m ${__classname__} $* 2>&1 | tee ${__logfile__}

Plugin configuration

To install this into your local repository, run "mvn install:install". On the target module, where you actually want to use this plugin, you need to configure it in the module's POM as shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
<project>
...
  <build>
  ...
    <plugins>
      ...
      <plugin>
        <groupId>com.mycompany.plugin</groupId>
        <artifactId>mycompany-maven-plugin</artifactId>
        <version>1.0-SNAPSHOT</version>
        <executions>
          <execution>
            <phase>deploy</phase>
            <goals>
              <goal>script</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

To call this plugin from the target application, run the following command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
prompt$ mvn -o mycompany:script -DclassName=com.mycompany.foo.bar.Baz
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'mycompany'.
[INFO] -------------------------------------------------------------------
[INFO] Building MyCompany Plugin Module
[INFO]    task-segment: [mycompany:script]
[INFO] -------------------------------------------------------------------
[INFO] [mycompany:script]
[INFO] Script runBaz.sh written to /home/.../target/runBaz.sh
[INFO] -------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] -------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Fri Apr 10 16:24:33 GMT-08:00 2009
[INFO] Final Memory: 5M/9M
[INFO] -------------------------------------------------------------------

Which results in a shell script that looks something like this (edited for brevity):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/bash
# Generated by mvn mycompany:script on Fri Apr 10 16:24:33 GMT-08:00 2009
M2_REPO=$HOME/.m2/repository
CLASSPATH=\
$M2_REPO/commons-cli/commons-cli/1.0/commons-cli-1.0.jar:\
$M2_REPO/commons-codec/commons-codec/1.3/commons-codec-1.3.jar:\
$M2_REPO/commons-io/commons-io/1.2/commons-io-1.2.jar:\
...\
/home/.../target/classes
java -cp $CLASSPATH -Xmx2048m com.mycompany.foo.bar.Baz $* \
  2>&1 | tee Baz.log

Conclusion

Of course, the bash scripts that we send to production are not quite this simple - there is additional validation to make sure another process is not already running, and hooks to email completion status, etc, but they too are boilerplate. I have intentionally kept the script simple, but it is easy to update the template file to produce something that is more robust and production-ready.

The plugin does not handle transitive dependencies - which kind of defeats the purpose of using Maven2, I know... But since we have parallel Maven2 and Ant descriptors (some of our developers haven't gotten around to becoming comfortable with Maven2 yet), this is not an issue in my case, since we explicitly list all dependencies in both the Maven2 POM and Ant's build.xml files. However, I hope to update the plugin at some point to include Maven2's transitive dependency detection. If you have already done the research on how to do this, I would appreciate pointers.

On a completely unrelated note...

On a completely unrelated note, I found yesterday that Stacey's, my favorite bookstore (for around the last 7 years) has gone out of business. They were selling off the book cases when I arrived yesterday. Up until 3 years ago, my last job was about a couple of blocks from the store, so I was a pretty frequent visitor, and would average about one computer book a month. I have since moved to another location that is a good half hour walk (or a 10 minute bus ride), so I haven't been going as often. In spite of the higher prices compared to Amazon's discounted prices, I liked going to the bookshop and buying from there, since (a) I did not have to wait for the book to be shipped and (b) I could compare different books before making a purchase. Not sure about you, but Amazon's look-inside feature just doesn't compare.

There is a Borders across the street from where I work, but most of the time they don't have what I want, and even if they do, the books are so disorganized that its like finding a needle in a haystack. The Barnes and Noble where I live is more organized, but their focus (probably rightly so, given the demographics) is on children's books.

If anybody knows of a good bookshop that sells computer books in or around the San Francisco Market Street area, would appreciate hearing from you. Otherwise, I guess I will just have to get used to buying books online.

4 comments (moderated to prevent spam):

Satish said...

Take a look at http://mojo.codehaus.org/appassembler/appassembler-maven-plugin/,it provides for all your needs and more.

Sujit Pal said...

Thank you, Satish, this is awesome. It is not a complete solution to my needs, but dependency resolution is built in, and I can work around its flaws. Thanks again!

Anonymous said...

And you shouldn't need to parse any xml files manually in Maven, as it is all available anyway..

Let your MOJO have this annotation:
@requiresDependencyResolution runtime

Inject this resource in your MOJO:
/**
* @readonly
* @parameter expression="${project}"
*/
private org.apache.maven.project.MavenProject project;

And use this to get access to the transitive resolved runtime artifacts:
List runtimeArtifacts = project.getRuntimeArtifacts();

And each artifact has an attached file you can look at.

But with that said, the app-assembler is probably useful as well

Sujit Pal said...

Cool, thanks! This tells me how to get the transitive dependencies, didn't know how to before.