<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Nicholas Hrycan&#039;s Blog</title>
	<atom:link href="http://hrycan.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://hrycan.com</link>
	<description>Software Engineering for Software Engineers by a Software Engineer</description>
	<lastBuildDate>Fri, 02 Dec 2011 22:50:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='hrycan.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Nicholas Hrycan&#039;s Blog</title>
		<link>http://hrycan.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://hrycan.com/osd.xml" title="Nicholas Hrycan&#039;s Blog" />
	<atom:link rel='hub' href='http://hrycan.com/?pushpress=hub'/>
		<item>
		<title>Shoppingcart supercharged by JBoss Drools Rule Engine</title>
		<link>http://hrycan.com/2011/11/28/shoppingcart-supercharged-by-jboss-drools-rule-engine/</link>
		<comments>http://hrycan.com/2011/11/28/shoppingcart-supercharged-by-jboss-drools-rule-engine/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 06:48:56 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[drools]]></category>
		<category><![CDATA[jboss rules]]></category>
		<category><![CDATA[jpetstore]]></category>
		<category><![CDATA[rule engine]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=170</guid>
		<description><![CDATA[The previous post on the JBoss Drools Rule Engine gave a high level overview of the benefits of using it in your web application. This post explores a concrete example. JPetstore is a simple shoppingcart application whose products are pets. This online petstore is included as a sample application with the Spring Framework bundle to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=170&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The previous post on the JBoss Drools Rule Engine gave a high level overview of the benefits of using it in your web application. This post explores a concrete example. JPetstore is a simple shoppingcart application whose products are pets. This online petstore is included as a sample application with the Spring Framework bundle to highlight some of Spring&#8217;s capabilities. What follows is how I modified JPetstore to leverage the JBoss Drools Rule Engine 5.2 (the commercial version offered by Red Hat is called JBoss Rules) so JPetstore could have some of the features we commonly see offered by online retailers.</p>
<p>It is common to see online retailers give customers who enter a coupon code during checkout a special offer. Some examples are:</p>
<ul>
<li>Buy 3 goldfish, get the 4th free.</li>
<li>Buy &#8220;camera model A&#8221; and get &#8220;tripod model B&#8221; for free</li>
<li>10% off your order cost when you enter offer code X</li>
<li>All batteries 20% off when using promo code X</li>
<li>$10 off any order totaling over $100</li>
<li>Buy any 3 car care products, get a free microfiber towel.</li>
<li>Free shipping using code X</li>
</ul>
<p>Many offers of this type are valid for only a short period of time – a week, a weekend, or even a single day (Black Friday offers).</p>
<p>This functionality calls for using a Rule Engine.</p>
<p>A rule engine could also serve as a &#8220;recommendation engine&#8221; &#8211; it can be used to recommend or suggest additional products to the customer based on their past orders, their user profile, etc. For instance, if you purchased car wax, the system might recommend their bestselling microfiber towel or something specific to your car&#8217;s year/make/model.<br />
<span id="more-170"></span><br />
<strong><span style="text-decoration:underline;">Modifications to JPetstore</span></strong></p>
<p><strong>Domain Model</strong><br />
Below is the modified domain model from JPetstore.<br />
<a href="http://hrycan.files.wordpress.com/2011/11/classdiagram.gif"><img class="aligncenter size-full wp-image-149" title="JpetstoreClassDiagram" src="http://hrycan.files.wordpress.com/2011/11/classdiagram.gif?w=600" alt="Jpetstore model class diagram"   /></a><br />
I added fields to Cart and CartItem to receive the changes from the business rules. For example, CartItem&#8217;s calculatedPrice is the modified list price of Item based on the rules. Cart&#8217;s createDate is when the shoppingcart was created. It is used in the rules to verify the given coupon has not expired.</p>
<p>It is important to have a rich domain model containing any field you think you&#8217;ll use. If it is not in the model, it can&#8217;t be used in the rules. Here, Product does not have a weight field. Having a weight field would have been useful for calculating a shipping cost. Product length/width/height could be used to determine if an additional shipping fee is required.</p>
<p><strong>Screen Flow</strong><br />
Previously, the application flow went from viewing the shoppingcart directly to collecting the payment information. I added an intermediary step. Now clicking &#8220;Proceed to checkout&#8221; on the &#8220;view cart&#8221; page runs the rule service then displays the modified cart. At that point, the old functionality to collect the payment information is a click away.<br />
<a href="http://hrycan.files.wordpress.com/2011/11/cart-450.jpg"><img class="aligncenter size-full wp-image-151" title="shoppingcart" src="http://hrycan.files.wordpress.com/2011/11/cart-450.jpg?w=600" alt="jpetstore shoppingcart before"   /></a></p>
<p style="text-align:center;"><em>View Cart</em></p>
<p><a href="http://hrycan.files.wordpress.com/2011/11/cart-result-450.jpg"><img class="aligncenter size-full wp-image-152" title="jpetstore modified cart" src="http://hrycan.files.wordpress.com/2011/11/cart-result-450.jpg?w=600" alt="jpetstore shoppingcart after rules"   /></a></p>
<p style="text-align:center;"><em>View Cart after running rules</em></p>
<p>Moving to the Spring MVC controllers, I created CheckoutController to perform the checkout step instead of having ViewCartController perform both actions. CheckoutController was configured to require authentication in petstore-servlet.xml to ensure an Account object (containing city, state, zip, etc) would be available to the rules.</p>
<p>The noteworthy thing about CheckoutController is that it calls the injected RuleService with the customers account and shoppingcart.</p>
<p><strong>JPetstore Rules</strong><br />
A few examples of the rules I wrote for JPetstore are shown below.</p>
<p><pre class="brush: java;">
rule &quot;10 dollars off any order over 100 with 10off100 code&quot;
	when
		$cart : Cart(coupon == &quot;10off100&quot;,
			subTotal &gt; 100,
			now &gt;= &quot;25-Nov-2011&quot;,
			now 	then
		$cart.setDiscount(10.00);
end

rule &quot;buy 2 iguana, get 5 dollars off&quot;
	when
		$cart : Cart(now &lt;= &quot;27-Nov-2011&quot;)
		$ci : CartItem(item.product.productId == &quot;RP-LI-02&quot;, 
			quantity &gt;= 2)        
	then
		$cart.setDiscount(5.00);
end

rule &quot;20% off price of any bird&quot;
dialect &quot;mvel&quot;
    when
        #conditions
        $ci : CartItem($item : item,
        	item.product.categoryId == &quot;BIRDS&quot;)
    then
        #actions
        $ci.setCalculatedPrice($ci.item.listPrice * 0.8);
end
</pre></p>
<p>As you can see with the three rules shown, it is basically pattern matching on object types and field values. You can store matched field values in variables that can later be used in either the LHS or RHS.</p>
<p><b>Rule Service</b><br />
RuleServiceImpl is where the rule engine is invoked with the user&#8217;s account and shoppingcart objects.  The rules operate on objects inserted into the KnowledgeSession.  Here you see the Cart, CartItems, and Account inserted into the session so the rules can operate on them.  Changes to the objects made made by the triggered rules are visible when fireAllRules returns.</p>
<p><pre class="brush: java;">
public class RuleServiceImpl implements RuleService {
    final Logger logger = LoggerFactory.getLogger(RuleServiceImpl.class);
    private ItemDao itemDao;
    private KnowledgeAgent kagent;
	
    public RuleServiceImpl() {
        kagent = KnowledgeAgentFactory.newKnowledgeAgent( &quot;MyAgent&quot; );
        kagent.applyChangeSet(ResourceFactory.newClassPathResource(
                &quot;droolsChangeSet.xml&quot;));

        ResourceFactory.getResourceChangeNotifierService().start();
        ResourceFactory.getResourceChangeScannerService().start();
		
        KnowledgeBase kbase = kagent.getKnowledgeBase();
    }
	
    public void execute(Cart cart, UserSession user) {
        KnowledgeBase kbase = kagent.getKnowledgeBase();
        StatefulKnowledgeSession knowledgeSession = 
                                    kbase.newStatefulKnowledgeSession();
        JPetRuleLogger myLogger = new JPetRuleLoggerImpl(logger);
		
        knowledgeSession.setGlobal(&quot;myLogger&quot;, myLogger);
        knowledgeSession.setGlobal(&quot;itemDao&quot;, itemDao);
		
        knowledgeSession.insert(cart);
        for (Iterator i = cart.getAllCartItems(); i.hasNext(); ) {
            CartItem ci = (CartItem) i.next();
            knowledgeSession.insert(ci);
        }
		
        Account acct = user.getAccount();
        knowledgeSession.insert(acct);
		
        //run the rules
        knowledgeSession.fireAllRules();
		
        knowledgeSession.dispose();
    }
...
</pre></p>
<p><b>Separating the rule lifecyle from the application lifecyle</b><br />
One of the advantages of using a rules engine is you can separate the business rule lifecyle from the application lifecyle.  An easy way to achieve this separation with Drools is to use the combination of a KnowledgeAgent and a changeset.xml file as shown above in RuleServiceImpl.<br />
<pre class="brush: xml;">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;change-set xmlns='http://drools.org/drools-5.0/change-set'
	xmlns:xs='http://www.w3.org/2001/XMLSchema-instance'
	xs:schemaLocation='http://drools.org/drools-5.0/change-set 
	http://anonsvn.jboss.org/repos/labs/labs/jbossrules/trunk/drools-api/src/main/resources/change-set-1.0.0.xsd'&gt;
	&lt;add&gt;
	&lt;resource source='http://127.0.0.1:9090/jpetstoreApplication.drl'
		type='DRL' /&gt;
	&lt;resource source='http://127.0.0.1:9090/jpetstoreDSL.dsl'
		type='DSL' /&gt;
	&lt;resource source='http://127.0.0.1:9090/jpetstoreDSLR.dslr'
		type='DSLR' /&gt;  
	&lt;/add&gt;
&lt;/change-set&gt;
</pre><br />
This file allows you to externalize the rule resource names from your code.  Here I am using an external server URL to get the rule resource completely off the appserver running the JPetstore application.  A more heavy duty approach is to use Guvnor BRMS (Business Rule Management System) and specify your Guvnor package url in changeset.xml instead of the individual resource URLs.</p>
<p>Another benefit of  KnowledgeAgent is that it caches the knowledgebase for you.  With Drools, it is an expensive operation to create a KB so you do not want to create them with every request.  KnowledgeSession, which are obtained from KB are not so costly, so each user request gets their own.</p>
<p>Being able to change the rule resources outside a full application deployment is not helpful unless the application can detect the rule resource was changed and thereafter use the updated resource.<br />
That is the purpose of these 2 lines in RuleServiceImpl:<br />
<pre class="brush: java;">
		ResourceFactory.getResourceChangeNotifierService().start();
		ResourceFactory.getResourceChangeScannerService().start();
</pre><br />
The default is to check for changes every 60 seconds.  Once a change is detected, the next call to kagent.getKnowledgeBase(); returns the new KnowledgeBase.</p>
<p><u><b>Conclusion</b></u><br />
As you can see, the Drools rule engine makes offering this type of functionality very easy.  A rule engine is a great tool to have in your toolset.</p>
<p>If you would like to get this running on your machine, you can use Eclipse with the SVN plugin and the m2eclipse Maven plugin to create the jpetstore project from <a href="https://src.springframework.org/svn/spring-samples/jpetstore/trunk/org.springframework.samples.jpetstore/" target="new">Spring&#8217;s SVN repository</a>.  JPetstore is a Maven project so you&#8217;ll just need to add the Drools dependencies to the pom.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/170/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=170&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2011/11/28/shoppingcart-supercharged-by-jboss-drools-rule-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2011/11/classdiagram.gif" medium="image">
			<media:title type="html">JpetstoreClassDiagram</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2011/11/cart-450.jpg" medium="image">
			<media:title type="html">shoppingcart</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2011/11/cart-result-450.jpg" medium="image">
			<media:title type="html">jpetstore modified cart</media:title>
		</media:content>
	</item>
		<item>
		<title>Leveraging a Business Rule Engine</title>
		<link>http://hrycan.com/2011/11/28/leveraging-a-business-rule-engine/</link>
		<comments>http://hrycan.com/2011/11/28/leveraging-a-business-rule-engine/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 04:11:17 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[drools]]></category>
		<category><![CDATA[jboss rules]]></category>
		<category><![CDATA[rule engine]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=143</guid>
		<description><![CDATA[JBoss Drools Expert (or JBoss Rules for the enterprise version of Drools supported by Red Hat) is a business rule engine.  You may have heard of a Rule Engine in the past but were unsure of how it can benefit you and your application.  In the following paragraphs, I will list some of the benefits [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=143&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>JBoss Drools Expert (or JBoss Rules for the enterprise version of Drools supported by Red Hat) is a business rule engine.  You may have heard of a Rule Engine in the past but were unsure of how it can benefit you and your application.  In the following paragraphs, I will list some of the benefits of using a Rule Engine in a traditional web application, the different formats used to express rules in Drools, and a general overview of how Drools is used.<br />
<span id="more-143"></span></p>
<p><b><u>Benefits of using a Rule Engine:</u></b><br />
<b>Separates the Business Rule lifecyle from the Application lifecyle</b><br />
In many applications, the business rules change more often than the application code.  Moving your rules to a Rule Engine allows them to be updated independently of the application source code.  This would allow rule changes to occur much more rapidly than the alternative of a full production release of the application.</p>
<p><b>Simplifies creating/modifying business rules</b><br />
Instead of having nested &#8220;if..else&#8221; statements littered throughout your codebase, you have a known set of rule files.  Rules expressed in the Drools Rule Language (DRL), a Domain Specific Language, or a Decision Table are easier to understand because the format is clearer compared to nested &#8220;if..else&#8221; statements in the source code.  It is possible for business analysts to create and modify rules without needing a developer&#8217;s assistance depending on the tools available to the business analyst.</p>
<p><b>Fosters reuse and creates a central point of knowledge</b><br />
The benefit of business rules being easier to modify/update is multiplied if you use the same rules in multiple applications.  The centralization of rules in a Rule Engine forms a central point of knowledge.  For example, if 5 applications use many of the same rules, then a single change is needed to update all 5 applications.  Contrast this with updating a different implementation of the rules in each of the 5 applications.  This approach to updating the rules is less prone to errors and is much quicker to complete.</p>
<p><b>Increases the accessibility of rules so they can be verified and audited by experts</b><br />
When not using a Rule Engine, developers implement the rules expressed in a requirements document into source code.  This translation step can introduce errors.  Further, requirements documents can become outdated.  Externalizing rules allows them to be quickly retrieved and reviewed by the business experts.  The technical skill level required to review them is reduced if the rules are expressed in the form of a Decision Table or Domain Specific Language.</p>
<p><b><u>Rule Formats using Drools 5.2</u></b></p>
<p><b>Drools Rule Language</b><br />
Here is an example rule written in the drools rule language.  The file extension is .drl</p>
<p><pre class="brush: java;">
rule &quot;10 dollars off any order over 100 with 10off100 code&quot;
   when
      $cart : Cart(coupon == &quot;10off100&quot;,
              subTotal &gt; 100,
              now &gt;= &quot;25-Nov-2011&quot;,
              now &lt;= &quot;28-Nov-2011&quot; )
   then
      $cart.setDiscount(10.00);
end
</pre></p>
<p>Each rule has a name.  There is a left hand side (when) and a RHS (then).  When the LHS matches what you inserted into working memory, the RHS is executed.  As you can see, the LHS is basically pattern matching on object types and on the value of their attributes.  You can capture a match and store it in a variable (shown here as $cart).  This variable can be used to restrict the later patterns in the LHS or used as needed in the RHS.</p>
<p><b>Decision Table</b><br />
A decision table is a specially formatted spreadsheet that is translated by Drools into the drools rule language.  Expressing rules in this format makes it easier to see missed scenarios.  Below is an example of a drools decision table.<br />
<a href="http://hrycan.files.wordpress.com/2011/11/dec-table.jpg" target="new"><img class="aligncenter size-medium wp-image-150" title="dec-table" src="http://hrycan.files.wordpress.com/2011/11/dec-table.jpg?w=300&#038;h=117" alt="drools decision table" width="300" height="117" /></a><br />
In this example, Drools creates a rule in the drools rule language behind the scenes for you for each row of the table starting from row 9.  The columns are either conditions of the rule (LHS) or the action to take (RHS) when there is a match.  The colors shown in the spreadsheet are cosmetic.</p>
<p>The orange cells of the spreadsheet represent the rule domain objects and their attributes used in the rules.  The lower cells in white area are the data values for those attributes.</p>
<p>The rule for Row 9 says “if there is a driver having an age 18 to 24 years old with 0 prior claims and is applying for comprehensive insurance then give the driver a 1% discount on their policy.</p>
<p>Instead of using the drools rule engine and the populated decision table, you could implement this table in Java using normal &#8220;if..else&#8221; statements.  Lets look at one such implementation.</p>
<p><pre class="brush: java;">
if (age &gt;= 18 &amp;&amp; age &lt;= 24 &amp;&amp; priorClaims == 0) {
    if (type.equals(&quot;COMPREHENSIVE&quot;)) {
        policy.applyDiscount(1);
    }
    else if (type.equals(&quot;FIRE_THEFT&quot;)) {
        policy.applyDiscount(2);
    }
} else if (age &gt;= 25 &amp;&amp; age &lt;= 30 &amp;&amp; priorClaims == 1 
        &amp;&amp; type.equals(&quot;COMPREHENSIVE&quot;)) {
    policy.applyDiscount(5);
}
</pre></p>
<p>The small rule table with 3 conditions is starting to look very busy when translated to Java &#8220;if..else&#8221; statements with only a few rows implemented.  Imagine if each rule had 10 conditions.  The decision table would still be readable, however the corresponding Java code would be menacing.</p>
<p>Implementing the rules in this manner would no doubt involve much copy and paste.   This may lead to shortcuts to reduce the amount of code by nesting some of the &#8220;if..else&#8221; statements.  It may work fine, but maintainability would suffer.  What would happen if a condition changed and required some of the nesting to be eliminated?  For example, imagine a new rule was added for 18-24 wanting comprehensive with 1 prior claim.  There would be a ripple effect.  Imagine if an a new condition needed to be added to the rules.  Using a decision table, this would be an extra column with the associated values.  Changing Java &#8220;if..else&#8221; statements would be a nightmare.</p>
<p><b>Domain Specific Language (DSL)</b><br />
Another means available in Drools for expressing rules in a format friendly to non-technical users is to use a DSL.  A DSL empowers business experts to write rules using a set of phrases you developed collaboratively with them.  This enables them to express the rules in the language natural to their business domain in the form of a sentence.  Instead of cryptic code, it looks like a requirements document.  Here is an example DSL and rules written using it.</p>
<p><pre class="brush: java;">

expander jpetstore.dsl
rule &quot;10off50 coupon between nov 23 to 29 2011&quot;
when
A customer prepares to place an order
- using coupon code &quot;10off50&quot;
- totaling at least 50 dollars
- starting on &quot;23-Nov-2011&quot;
- ending on &quot;29-Nov-2011&quot;
then
discount the order total by 10 dollars
end
</pre></p>
<p><i>DSL rule (jpetstore.dslr)</i></p>
<p>This format is easy to understand and allows the creation of multiple rules of this type quickly using GUI tools.  Further, not all of the restrictions on the order are required.  For example, the line specifying the coupon code could be eliminated which would then allow any order over $50 to obtain the $10 discount.</p>
<p>Now lets take a look at the technical side, the DSL definition, which would be created by a developer.</p>
<p><pre class="brush: java;">
[when]A customer prepares to place an order=cart:Cart(subTotal &gt; 0)
[when]- using coupon code &quot;{coupon}&quot;=coupon == &quot;{coupon}&quot;
[when]- totaling at least {value} dollars=subTotal &gt;= {value}
[when]- starting on &quot;{start}&quot;=now &gt;= &quot;{start}&quot;
[when]- ending on &quot;{end}&quot;=now &lt;= &quot;{end}&quot;
[then]discount the order total by {amount} dollars=cart.setDiscount({amount});
</pre></p>
<p><i>DSL definition (jpetstore.dsl)</i></p>
<p>Basically, the phrase on the left is converted to the code on the right with the appropriate values used in the placeholders.  Using the Drools IDE in Eclipse, you can see how rules written using a DSL (the .dslr file) are transformed into the drools rule language by clicking on the DRL viewer table in the Eclipse editor.  </p>
<p>Besides an increase in readability, you also gain an increase in flexibility when you use a DSL because the phrases are independent of the domain model.  You can change the domain model without impacting the rules that are written in the DSL.  Similarly, you can change the wording of the phrases independently of the domain model.  All that is required is a modification to the DSL definition.</p>
<p><b><u>General steps needed to use Drools in your application</u></b><br />
Here is an overview of how to use Drools in your application:</p>
<ul>
<li>Create a POJO (Plain Old Java Object) domain model complete with attributes.  It should model all the concepts needed by your rules</li>
<li>Author your rules using your domain model in your desired format.  The rules be written in the drools rule language, a decision table, or a domain specific language</li>
<li>Build a KnowledgeBase using one or more rule resources.</li>
<li>Create a KnowledgeSession from the KnowledgeBase</li>
<li>Insert populated POJOs into the KnowledgeSession</li>
<li>Run the rules which then modify the appropriate POJOs</li>
<li>Use the modified POJOs in your application</li>
</ul>
<p><b><u>Conclusion</u></b><br />
I&#8217;ve covered the benefits of using a Rule Engine, the different formats rules can be expressed using the Drools Rule Engine, and a general overview of the steps required to use Drools in your application.  In a later post, I will show how a shoppingcart can leverage a rules engine to implement some of the features we commonly see offered by online retailers.</p>
<p>To learn more about Drools and related products like Guvnor, you can visit the community project&#8217;s <a href="http://www.jboss.org/drools" target="new">homepage</a>.  JBoss Rules, the enterprise version of Drools supported by Red Hat, also has its documentation available <a href="http://docs.redhat.com/docs/en-US/JBoss_Enterprise_BRMS_Platform/5/html-single/JBoss_Rules_5_Reference_Guide/index.html" target="new">online</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/143/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/143/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/143/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/143/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/143/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/143/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/143/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=143&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2011/11/28/leveraging-a-business-rule-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2011/11/dec-table.jpg?w=300" medium="image">
			<media:title type="html">dec-table</media:title>
		</media:content>
	</item>
		<item>
		<title>File Uploads</title>
		<link>http://hrycan.com/2010/06/02/file-uploads/</link>
		<comments>http://hrycan.com/2010/06/02/file-uploads/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 03:20:37 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Multipart]]></category>
		<category><![CDATA[Runtime]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=134</guid>
		<description><![CDATA[In the last post, I covered Magic Numbers and how they can be used along with file extensions to validate file uploads. What would happen if you did not restrict the file types you users can upload and picked the wrong location to save them on your server? What if a user uploaded a JSP [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=134&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the last post, I covered Magic Numbers and how they can be used along with file extensions to validate file uploads.  What would happen if you did not restrict the file types you users can upload and picked the wrong location to save them on your server?  What if a user uploaded a JSP file?<br />
<span id="more-134"></span><br />
Let&#8217;s first review file uploads.  Many web application frameworks, like <a href="http://struts.apache.org/2.0.14/docs/file-upload.html" target="new">Struts2</a> and <a href="http://static.springsource.org/spring/docs/2.5.x/reference/mvc.html#mvc-multipart" target="new">Spring MVC</a>, make accepting file uploads easy.  You simply configure the respective framework and have a form on your page with enctype=&#8221;multipart/form-data&#8221; along with an input field of type &#8220;file&#8221;.  Under the hood, these frameworks delegate the heavy lifting to the Apache Commons FileUpload component.  Then, when a user selects a file on their computer and clicks the submit button, their local file is uploaded to the server hosting the webapp and stored in the configured temp directory.  Finally, the respective Controller or Action class that is configured to handle the form submission saves the file to where it will be accessed later. </p>
<p>Many applications like to save uploads where they can later be accessed/served to other users.  One location to use is the exploded WAR file directory of your application.  If you are using Tomcat, this would be $CATALINA_HOME/webapps/YourWarFile. </p>
<p>Saving your uploaded files in this location and not restricting the types of files a user can upload is very dangerous because a user could upload and execute a JSP file.  Once the user accesses their JSP on your server, the application server would execute any code contained in it with the privileges of the system user running the application server. </p>
<p>What code could this JSP execute?  The JSP would have access to any Java library available in the application and the standard Java libraries, including java.lang.Runtime.  As you may recall, with Runtime you can execute and display the output of local system commands from Java with code along the lines of:</p>
<p><pre class="brush: java;">
&lt;%
Runtime runtime = Runtime.getRuntime();
Process process = runtime.exec(command);
%&gt;
</pre></p>
<p>An example command that can be run on Windows is &#8220;cmd /c dir&#8221; and a similar example on Linux is &#8220;ls –l&#8221;.  Imagine the possibilities if the application server was running as root!</p>
<p>It is a good idea to limit the types of files you application accepts and do not accept JSP files.  Also, carefully consider where uploaded files are ultimately stored.  Finally, run your application server with a low privileged user that has the minimal system access required to perform the job.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/134/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=134&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2010/06/02/file-uploads/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>
	</item>
		<item>
		<title>Magic Numbers</title>
		<link>http://hrycan.com/2010/06/01/magic-numbers/</link>
		<comments>http://hrycan.com/2010/06/01/magic-numbers/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 16:02:22 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[file upload]]></category>
		<category><![CDATA[Magic Number]]></category>
		<category><![CDATA[Tika]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=126</guid>
		<description><![CDATA[Many web applications allow their user&#8217;s to upload files on their computer to the application&#8217;s remote server. An example of this type of application is an image sharing service where you can upload and share your vacation photos. This type of application has no reason to accept MS Word documents, PDF files, or mp3 files. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=126&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Many web applications allow their user&#8217;s to upload files on their computer to the application&#8217;s remote server.   An example of this type of application is an image sharing service where you can upload and share your vacation photos.  This type of application has no reason to accept MS Word documents, PDF files, or mp3 files.   It makes sense to empower the application to reject undesirable file types. </p>
<p>So how do you prevent users from uploading PDF files, MS Word documents, etc to your image sharing service?<br />
<span id="more-126"></span><br />
One approach is to maintain a list of allowed file extensions such as .gif, .jpg, and .png.  This type of list which contains only what is allowed is referred to as a whitelist. In other words, it is a list of what we accept and we reject everything else that is not in the list.  (The opposite is called a blacklist – a list of what we reject and we accept everything else.)  With this whitelist of file extensions, we can check the filename in the Controller/Action class handling the upload to see if it is allowed by the system by using a simple regex.  If it is not valid, the application can notify the user.<br />
The problem with this approach is the file name has nothing to do with the actual file content.  A user can simply rename test.pdf to test.jpg and the system would accept it. </p>
<p>A better approach is to check the file&#8217;s content for its &#8220;magic number&#8221;.  Wikipedia has a good explanation of <a href="http://en.wikipedia.org/wiki/Magic_number_%28programming%29#Magic_numbers_in_files" target="new">magic numbers</a>, but basically many file types have a distinct signature that can be used to identify it.  For example, the content of a PDF file starts with %PDF.  It even includes the version like %PDF-1.3 to indicate a PDF file version 1.3 or %PDF-1.6 to indicate PDF version 1.6.  Likewise, the content of a GIF image starts with GIF89a or GIF87a.  So if someone is uploading a GIF image and it does not contain the correct magic number, then you know it is not a GIF.</p>
<p>One way to quickly implement this idea in your application is to leverage <a href="http://tika.apache.org/index.html" target="new">Apache Tika</a>.  Tika is a subproject of Lucene and is used in Solr.  From the Tika website &#8220;Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries&#8221;.</p>
<p><pre class="brush: java;">
FileInputStream fis = ...//uploaded file
Tika tika = new Tika();
String result = tika.detect(fis);
</pre></p>
<p>From this, you&#8217;ll get a String such as image/png, image/jpeg, text/plain,  application/pdf.  For a GIF image, you&#8217;ll get image/gif.</p>
<p>Are Magic numbers a magic solution to validating the file type of an uploaded file?  Could a magic number checker be tricked?  It is best to follow a <a href="http://www.owasp.org/index.php/Defense_in_depth" target="new">defense in depth</a> approach and check both the file extension and the content type any time you allow users to upload files to your system.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/126/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/126/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/126/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/126/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/126/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/126/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/126/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/126/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=126&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2010/06/01/magic-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>
	</item>
		<item>
		<title>Optimistic and Pessimistic Concurrency Control</title>
		<link>http://hrycan.com/2010/03/31/optimistic-and-pessimistic-concurrency-control/</link>
		<comments>http://hrycan.com/2010/03/31/optimistic-and-pessimistic-concurrency-control/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 15:12:17 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[lost update problem]]></category>
		<category><![CDATA[optimistic concurrency control]]></category>
		<category><![CDATA[pessimistic concurrency control]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=105</guid>
		<description><![CDATA[Introduction Most web applications allow a user to query a database, retrieve a local copy of the queried data, make changes to that local copy, and then finally send the updates and the unchanged values back to the database. These are often described as CRUD (Create, Read, Update, Delete) applications. An example would be a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=105&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><b>Introduction</b><br />
Most web applications allow a user to query a database, retrieve a local copy of the queried data, make changes to that local copy, and then finally send the updates and the unchanged values back to the database.  </p>
<p>These are often described as CRUD (Create, Read, Update, Delete) applications.  An example would be a simple web registration form which would request from the user their  first name, last name, email address, password, and mailing list preferences in order to create a profile on the site.  In this scenario, we&#8217;ll have a one to one mapping of the user table to the form on a web page.  An example of each CRUD operation would be:</p>
<ul>
<li>Create – registering on the site</li>
<li>Read – logging in and viewing your profile</li>
<li>Update – changing your mailing list preferences</li>
<li>Delete – requesting your account be closed.  Many times data is not actually deleted from the database, but instead an update is executed to set the status to &#8220;inactive&#8221;.</li>
</ul>
<p>In this user profile example, only one person is updating the data (since only one person knows the correct password needed to access the record) so the data in the database is the same between the Read and Update steps.  However, there are some web applications where multiple users can modify the same data set concurrently.  These applications can suffer from the &#8220;lost update&#8221; problem.</p>
<p><b>Lost Update Problem</b><br />
To illustrate the lost update problem, consider a web application which allows salespeople to maintain a list of clients.  This web application allows any salesperson to view a list of clients and then click on any one client to update that client&#8217;s information.  The user must click the &#8220;Save&#8221; button to persist their changes to the database.<br />
<span id="more-105"></span></p>
<p>Using this web application, Joe (Salesperson) wants update Bob&#8217;s (Client) record by adding Bob&#8217;s new cell phone number.   Joe performs a search, the application displays a matching list of client records, and Joe clicks on Bob&#8217;s record.  At this time, the data currently stored in the database for Bob is retrieved and displayed on a page in Joe&#8217;s web browser.  Joe decides to get a cup of coffee from the break room before making his change to Bob&#8217;s record.</p>
<p>While Joe gets his coffee, Fred (Salesperson) wants to update the address on Bob&#8217;s record.  Like Joe, Fred accesses Bob&#8217;s record via a search and is shown the data currently stored in the database for Bob.  At this point, both Fred and Joe have the same data displayed in their web browser for Bob.  Fred changes Bob&#8217;s address in the form, clicks the &#8220;Save&#8221; button, and goes to Lunch.</p>
<p><img src="http://hrycan.files.wordpress.com/2010/04/concur-form2.jpg?w=600" alt="web form" title="concur-form2"   class="aligncenter size-full wp-image-108" /></p>
<p>Joe comes back with his coffee.  At this point, Joe&#8217;s view of Bob&#8217;s data is not the current data for Bob in the database.  Fred changed Bob&#8217;s address and that change is not shown in Joe&#8217;s web browser.  Most web applications operate in this manner &#8211; they only send data to the user (Joe) when the user requests it.  Now Joe adds Bob&#8217;s cell phone number and clicks the &#8220;Save&#8221; button.  This causes Bob&#8217;s new phone number as well as the outdated address that was shown in his form to be saved in the database.  <u>Fred&#8217;s update is now lost.</u></p>
<p>Fred gets back from lunch, searches for and views Bob&#8217;s record again, and sees Bob&#8217;s old address along with a new phone number.  Fred is angered and fires off an email to his boss telling him the cool new sales web application is broken and he wants to use the old application.</p>
<p>As you can imagine, this is not good for the application development team or the salespeople.  The diagram below illustrates the sequence of events leading up to the lost update.</p>
<p><a href="http://hrycan.files.wordpress.com/2010/04/concur1.jpg" target="new"><img src="http://hrycan.files.wordpress.com/2010/04/concur1.jpg?w=244&#038;h=300" alt="lost update problem illustrated" title="concurrency" width="244" height="300" class="aligncenter size-medium wp-image-107" /></a></p>
<p>As is often the case, each time the user clicks &#8220;Save&#8221; on the page in their web browser, all the data in their web form gets sent to the application server and written to the database.  The flaw in this case is the application assumes the data submitted by the user is the current state of the data in the database.  As you can see with the above example with Joe and Fred, this is not necessarily true.</p>
<p>To handle the lost update problem, you can use either optimistic concurrency control or pessimistic concurrency control.</p>
<p><b>Optimistic Concurrency Control</b><br />
Using optimistic concurrency control, an additional value is sent along with the user&#8217;s web form data when the user clicks the &#8220;Save&#8221; button.  This additional value is then used to determine if someone else changed the data in the database after this user last read it.  If the data in the database was not changed, the user&#8217;s change succeeds otherwise it fails and the user gets a friendly error message along the lines of <i>&#8220;Another user has changed the data since your last request.  Please try again&#8221;</i>.  </p>
<p>At this point, some implementations might try to help the user merge their changes with the current state in the database.  Others implementations send the user the latest state of the data in the database and force them to re-enter their changes.  As you can imagine, optimistic concurrency control can get very frustrating if you have a large web form and get many &#8220;please try again&#8221; messages.</p>
<p>A common implementation of optimistic concurrency control is to have a timestamp column in your table that is automatically updated by the database to match the time the row was last modified.  When the user reads the record from the database to populate their web form, the timestamp is included and becomes a hidden form field that is sent back to the database with all the other form data when the users clicks the &#8220;Save&#8221; button.  The &#8220;where&#8221; clause of the update SQL statement is modified to now include the timestamp:<br />
<b>update &#8230; where id = ? and timestamp = ?</b></p>
<p>Going back to our example, if it used optimistic concurrency control both Fred and Joe would get the same timestamp value from the database in their initial read.  Fred&#8217;s phone number change would succeed because the timestamp matches the current timestamp in the row for Bob.  Besides changing the phone number, Fred&#8217;s change causes the database to update Bob&#8217;s timestamp column to match the time Fred submitted his change.  </p>
<p>Now Joe sends his address change with the old timestamp and the application would reject it because the where clause, which now includes the timestamp, does not succeed.  The application would give Joe an error message along with the current state of Bob&#8217;s record, including the new timestamp.  Joe now has the option to redo his address change.</p>
<p><b>Pessimistic Concurrency Control</b><br />
Using pessimistic concurrency control, the concept of locking data to prevent others from attempting to modify it is introduced.  A user can only change data if the user has its lock.  The lock on the data is obtained when the data is initially read and is released when the user sends their changes to the server.  If a user attempts to obtain the lock when another user already has it, it will fail and the user would get an error message.</p>
<p>Locks can create a number of issues.  The classic example is one user needing to change a record locked by another user who is on vacation.  With pessimistic concurrency, some things to consider are how locks are acquired and released, if locks automatically expire after a period of time, and if a user can forcefully unlock data locked by another user.</p>
<p>A common implementation of pessimistic concurrency control is to use a <b>&#8220;select &#8230; for update nowait&#8221;</b> SQL statement when reading data with the intent to modify it.  Once that statement executes, that user has the lock.  Another user who later attempts to execute the same statement before the user with the lock either does a commit or rollback would get an error.  For Oracle, the error is:<br />
ORA-00054: resource busy and acquire with NOWAIT specified</p>
<p>If the example sales application used pessimistic concurrency control, Joe&#8217;s request to read Bob&#8217;s data from the database would give Joe the lock on Bob&#8217;s record.  Later, when Fred would read Bob&#8217;s record with the intent to modify it, Fred would be given a <i>&#8220;This record is currently locked.  Please try again later&#8221;</i> application error since Joe already has the lock.  Once Joe clicks the &#8220;Save&#8221; or &#8220;Cancel&#8221; button, the lock is released.</p>
<p><b>Conclusion</b><br />
Of the two strategies, optimistic concurrency control is widely used in web applications.  Pessimistic concurrency control is not often seen with web applications, however it is seen with client/server (2-tier) applications.  </p>
<p>To use pessimistic concurrency control, the client needs to maintain the same database connection for their entire interaction with the database (think SQL*Plus).  This is often described as a stateful connection.  Web applications commonly use a database connection pool in the application server for performance and scalability reasons (a relatively small number of db connections can be recycled and serve a large number of web clients).  As a result state is not retained because a web user gets a random db connection from the pool with each request.  </p>
<p>Going back to the sequence diagram at the beginning, each time Joe or Fred submits their request to the application server, a random db connection from the database connection pool is extracted, executes the SQL statements on their behalf, and is returned to the pool for the next user request.</p>
<p>Another valid option to handling the lost update problem is to design your application so multiple users do not have write access to the same set of records.  In my example, if each client is assigned a salesperson, then only the assigned salesperson can change the record.  Joe is assigned to Bob and only Joe can change Bob&#8217;s data.  Fred would never see Bob in the list of users he can edit.</p>
<p>It is also worth noting that loosing an update may not be a problem for some applications as they may have a history table or an audit table that allows users to view past changes inside the application.  These applications follow what is a called a &#8220;Last update wins&#8221; or &#8220;Last commit wins&#8221; strategy.  The sales web application as described at the beginning of this post uses the &#8220;Last update wins&#8221; strategy.  This strategy is very easy to implement because it is what you get if you do nothing.</p>
<p>The &#8220;lost update&#8221; problem is not a very difficult problem to solve once you are made aware of it.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/105/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=105&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2010/03/31/optimistic-and-pessimistic-concurrency-control/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2010/04/concur-form2.jpg" medium="image">
			<media:title type="html">concur-form2</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2010/04/concur1.jpg?w=244" medium="image">
			<media:title type="html">concurrency</media:title>
		</media:content>
	</item>
		<item>
		<title>Paginating Lucene Search Results</title>
		<link>http://hrycan.com/2010/02/10/paginating-lucene-search-results/</link>
		<comments>http://hrycan.com/2010/02/10/paginating-lucene-search-results/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 15:05:11 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[pagination]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=85</guid>
		<description><![CDATA[Most search results are returned to users in a paginated form. In other words, only X results are shown to the user on a single page and the user has a way to navigate to the next X results. Let&#8217;s use Google as an example. By default, you are shown the first 10 best results [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=85&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Most search results are returned to users in a paginated form.  In other words, only X results are shown to the user on a single page and the user has a way to navigate to the next X results.</p>
<p>Let&#8217;s use Google as an example.  By default, you are shown the first 10 best results matching your query.  At the bottom of the page, there are numbered links representing the corresponding pages of the search results.  Clicking on &#8220;3&#8243; gives you the 3rd page of results.  With a page size of 10, the results shown are 21 -30.</p>
<p>Searching for &#8220;lucene&#8221; at Google, it responds with<br />
&#8220;Results <strong>1 &#8211; 10</strong> of about <strong>2,440,000</strong> for <strong>lucene</strong>. (0.33 seconds)&#8221;</p>
<p>Imagine if Google did not use pagination and instead returned everything to you at once on a single page!  That would be a <span style="text-decoration:underline;">very</span> large page and take a <span style="text-decoration:underline;">very</span> long time to load.</p>
<p>If you use Lucene, you too can present your search results in a paginated manner even though Lucene does not provide a direct way to do it in their API (2.4.1).  Thanks to how Lucene is designed, it is very easy to implement.<br />
<span id="more-85"></span></p>
<p>In Lucene, when you perform a search you do not retrieve the actual documents immediately, instead you get an array of ranked pointers to the document matches.  Specifically, the TopDocs object returned contains an array of ScoreDoc objects which contain the document IDs of the matching documents.  They are ordered by decreasing relevancy to the user supplied search term, so the best matches are at the front.  It is the call to the IndexSearcher object, specifically IndexSearcher.doc(docID), which pulls the document into memory.  Lucene&#8217;s lazy loading of documents is very beneficial for pagination.</p>
<p>As we can see in the Lucene FAQ, the Lucene developers recommended approach to <a href="http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_implement_paging.2C_i.e._showing_result_from_1-10.2C_11-20_etc.3F" target="new">pagination</a> is re-executing the search and navigating to the correct position in the ScoreDoc array.</p>
<p>As a result, pagination becomes a problem of finding the correct start position and end position in the ScoreDoc array.<br />
Here is a paginated search: </p>
<p><pre class="brush: java;">
Query query = qp.parse(searchTerm);
TopDocs hits = searcher.search(query, maxNumberOfResults);
ArrayLocation arrayLocation = paginator.calculateArrayLocation(hits.scoreDocs.length, pageNumber, pageSize);

for (int i = arrayLocation.getStart() - 1; i &lt; arrayLocation.getEnd(); i++) {
	int docId = hits.scoreDocs[i].doc;

	//load the document
	Document doc = searcher.doc(docId);
	String filename = doc.get(&quot;filename&quot;);
	String contents =  doc.get(searchField);

}
</pre><br />
Full source: <a href="http://code.google.com/p/hrycan-blog/source/browse/trunk/lucene-highlight/src/com/hrycan/service/SearchServiceImpl.java" target="new">SearchServiceImpl.java</a></p>
<p>Here are the details of calculating the locations in the ScoreDocs array: </p>
<p><pre class="brush: java;">
public class Paginator {

	public ArrayLocation calculateArrayLocation(int totalHits, int pageNumber,
		int pageSize) {
		ArrayLocation al = new ArrayLocation();

		if (totalHits &lt; 1 || pageNumber &lt; 1 || pageSize &lt; 1) {
			al.setStart(0);
			al.setEnd(0);
			return al;
		}

		int start= 1 + (pageNumber -1) * pageSize;
		int end = Math.min(pageNumber * pageSize, totalHits);
		if (start &gt; end) {
			start = Math.max(1, end - pageSize);
		}

		al.setStart(start);
		al.setEnd(end);
		return al;
	}
}
</pre><br />
Full source: <a href="http://code.google.com/p/hrycan-blog/source/browse/trunk/lucene-highlight/src/com/hrycan/utils/Paginator.java" target="new">Paginator.java</a></p>
<p>The majority of the time, your users will find what they need on the first page of search results since the matches most relevant to their search are returned at the top of the list.  I think most people don&#8217;t look beyond the first few pages and instead revise their search terms if they don&#8217;t immediately see what they need.</p>
<p>The benefits of pagination include:</p>
<ul>
<li>lower response time since fewer documents are processed on the server side</li>
<li>lower memory use since fewer documents are loaded</li>
<li>lower network use since the amount of data sent over the network to the client is reduced</li>
</ul>
<p>The end result is a better user experience.</p>
<p>Here are some numbers to give you a general idea of the impact of pagination.  It shows the same search being performed with and without pagination and its effect on response time.  MaxResults is the argument passed to the IndexSearcher.search(query, maxresults) method which bounds the number of results returned.</p>
<table>
<tr>
<th>Paginated</th>
<th>MaxResults</th>
<th>Number of Matches</th>
<th>Response Time</th>
</tr>
<tr>
<td style="text-align:right;">Yes</td>
<td style="text-align:right;">500</td>
<td style="text-align:right;">500</td>
<td style="text-align:right;">5ms</td>
</tr>
<tr>
<td style="text-align:right;">No</td>
<td style="text-align:right;">500</td>
<td style="text-align:right;">500</td>
<td style="text-align:right;">186ms</td>
</tr>
<tr>
<td style="text-align:right;">Yes</td>
<td style="text-align:right;">1000</td>
<td style="text-align:right;">1000</td>
<td style="text-align:right;">5ms</td>
</tr>
<tr>
<td style="text-align:right;">No</td>
<td style="text-align:right;">1000</td>
<td style="text-align:right;">1000</td>
<td style="text-align:right;">415ms</td>
</tr>
<tr>
<td style="text-align:right;">Yes</td>
<td style="text-align:right;">5000</td>
<td style="text-align:right;">2110</td>
<td style="text-align:right;">5ms</td>
</tr>
<tr>
<td style="text-align:right;">No</td>
<td style="text-align:right;">5000</td>
<td style="text-align:right;">2110</td>
<td style="text-align:right;">1007ms</td>
</tr>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/85/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=85&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2010/02/10/paginating-lucene-search-results/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>
	</item>
		<item>
		<title>Updating Document Fields in Lucene</title>
		<link>http://hrycan.com/2009/11/26/updating-document-fields-in-lucene/</link>
		<comments>http://hrycan.com/2009/11/26/updating-document-fields-in-lucene/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 15:30:51 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[lucene]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=58</guid>
		<description><![CDATA[Lucene 2.4.1 provides a convenient method for you to update a Document in your Index, namely the updateDocument method of IndexWriter (shown below) but what do you do if you want to update the Fields of an existing document? Lucene&#8217;s updateDocument operation is basically delete and insert wrapped into a single function. All documents matching [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=58&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Lucene 2.4.1 provides a convenient method for you to update a Document in your Index, namely the updateDocument method of IndexWriter (shown below) but what do you do if you want to update the Fields of an existing document?<br />
<pre class="brush: java;">
public void updateDocument(Term term, Document doc)
                    throws CorruptIndexException, IOException
</pre></p>
<p>Lucene&#8217;s updateDocument operation is basically delete and insert wrapped into a single function.  All documents matching the Term parameter are deleted from the Lucene index and the supplied Document instance is then inserted into the index.  While Lucene allows multiple copies of the same document to exist in the index, the behavior of the update operation does not insert a copy of the supplied document for every match.  In other words, if your Term matches 5 documents in the index then 5 documents are deleted and a single document is inserted in its place.   </p>
<blockquote><p>
As you can see, it is a very good idea for you to design your documents so they have a field that uniquely identifies them in the entire index.  In the database world, this is called a primary key field.  </p>
<p>At times, it is helpful to think of the Lucene index as a database having a single table and the Documents as rows.  It is a good analogy when you frame it in terms of searching.  Boolean Queries seem to fit this concept nicely.  </p>
<p>However, there are many differences between a Lucene index and a database.  </p>
<ul>
<li>Lucene does not provide a way to enforce field uniqueness.  It is up to you to achieve the concept.</li>
<li>Lucene does not require a predefined document schema for the documents in the index.  This means all documents in the index do not need to have the same number of fields or use the same field names.  As an example, some documents can have the fields (id, url, contents) and other documents can have the fields (productid, manufacturer, summary, review).  </li>
<li>Fields can be repeated in a document.  For example, a document can have 3 product review fields (productid, manufacturer, summary, review, review, review).  We will revist this later in the code example.</li>
</ul>
</blockquote>
<p>Lucene&#8217;s updateDocument method overwrites the document(s) matching the given Term with the given Document.  This is a problem if you only want to update a few fields and keep the remainder.  </p>
<p>In the scenario pictured below, you can uniquely identify a document in the index whose author field you would like to update.  So you then call updateDocument and pass in the Term and a Document instance populated with the new author field value.  The result is an updated author field and the loss of the 3 other fields previously stored &#8211;  the title, publisher, and contents fields.</p>
<p><img src="http://hrycan.files.wordpress.com/2009/11/lucene-document-update.gif?w=600" alt="Visual of Lucene&#39;s update document method" title="lucene-document-update"   class="aligncenter size-full wp-image-62" /></p>
<p>What do you do when you need to update a subset of the fields in a document but cannot re-create the remaining fields?  There can be many reasons for this dilemma.  Perhaps you are unable to re-create the fields because the original text is not available or perhaps the operations to re-create the fields are very costly.  </p>
<p>One approach to resolve this dilemma is to search for the current document in the index, change the desired fields, and use the modified document as the input to the updateDocument call.  This idea is illustrated below.  <a target="new" href="http://code.google.com/p/hrycan-blog/source/browse/trunk/lucene-highlight/src/com/hrycan/search/UpdateUtil.java">UpdateUtil.java</a> contains the full source.</p>
<p><pre class="brush: java;">
int docId = hits.scoreDocs[0].doc;
			
//retrieve the old document
Document doc = searcher.doc(docId);

List&lt;Field&gt; replacementFields = updateDoc.getFields();
for (Field field : replacementFields) {
	String name = field.name();
	String currentValue = doc.get(name);
	if (currentValue != null) {
		//replacement field value
		
		//remove all occurrences of the old field
		doc.removeFields(name);

		//insert the replacement
		doc.add(field);
	} else {
		//new field
		doc.add(field);
	}
}

//write the old document to the index with the modifications
writer.updateDocument(term, doc);
</pre></p>
<p>Here we pass in a Document that can have both replacement fields and additional fields for the document identified by a search using the term parameter as the basis for a TermQuery..  First we obtain the list of Fields from the document parameter.  If the matched document already has a field by that name, it is considered a replacement otherwise it is a new field to be added to the document.</p>
<p>Notice the method first removes all fields in the Document having the same name as the replacement prior to inserting the replacement field.  As mentioned earlier, a Lucene document can have multiple fields with the same name.  </p>
<p><img src="http://hrycan.files.wordpress.com/2009/11/lucene-index.gif?w=600" alt="visual of documents stored in a lucene index" title="lucene-index"   class="aligncenter size-full wp-image-67" /></p>
<p>Without the remove call, you would be adding another value for the field instead of replacing the existing value.  </p>
<p>A great tool to view what is actually in your Lucene index is <a target="new" href="http://www.getopt.org/luke/">Luke</a>, the Lucene Index Toolbox.  It is very helpful tool to answer “what if” questions when you read the Lucene API.</p>
<p>Out of the box, <a target="new" href="http://lucene.apache.org/java/docs/">Lucene</a> does not provide a way to update the individual fields of a document in the index.  However, it is relatively easy to achieve this functionality by grouping together the available API calls.</p>
<p>You can <a target="new" href="http://code.google.com/p/hrycan-blog/source/browse/trunk/lucene-highlight/#lucene-highlight/src/com/hrycan/search">browse the full source</a> at google code and download a copy of the entire project via svn.<br />
svn checkout http://hrycan-blog.googlecode.com/svn/trunk/lucene-highlight/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/58/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=58&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2009/11/26/updating-document-fields-in-lucene/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2009/11/lucene-document-update.gif" medium="image">
			<media:title type="html">lucene-document-update</media:title>
		</media:content>

		<media:content url="http://hrycan.files.wordpress.com/2009/11/lucene-index.gif" medium="image">
			<media:title type="html">lucene-index</media:title>
		</media:content>
	</item>
		<item>
		<title>Source now available at Google Code</title>
		<link>http://hrycan.com/2009/10/26/source-now-available-at-google-code/</link>
		<comments>http://hrycan.com/2009/10/26/source-now-available-at-google-code/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 15:23:48 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=52</guid>
		<description><![CDATA[I&#8217;ve uploaded the full source code for the Lucene Highlighter example to my project at Google Code. The code for the earlier posts will be there in the next few weeks. You can browse the full source online or you can use subversion to checkout the code for a local copy. As an example, to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=52&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve uploaded the full source code for the Lucene Highlighter example to my project at Google Code.  The code for the earlier posts will be there in the next few weeks.</p>
<p>You can <a href="http://code.google.com/p/hrycan-blog/source/browse/" target="new">browse the full source</a> online or you can use subversion to checkout the code for a local copy.  As an example, to checkout the lucene code at the command line:</p>
<p><pre class="brush: bash;">
svn checkout http://hrycan-blog.googlecode.com/svn/trunk/lucene-highlight/
</pre></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/52/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=52&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2009/10/26/source-now-available-at-google-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>
	</item>
		<item>
		<title>Lucene Highlighter HowTo</title>
		<link>http://hrycan.com/2009/10/25/lucene-highlighter-howto/</link>
		<comments>http://hrycan.com/2009/10/25/lucene-highlighter-howto/#comments</comments>
		<pubDate>Sun, 25 Oct 2009 22:14:45 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[highlighter]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=38</guid>
		<description><![CDATA[Background When you perform a search at Google or Bing, you enter your search terms, click a search button, and your search results appear. Each search result displays the title, the URL, and a text fragment containing your search terms in bold. Consider what happens when you search for &#8216;Apache&#8217; at Google. Your results would [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=38&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Background</strong><br />
When you perform a search at Google or Bing, you enter your search terms, click a search button, and your search results appear.  Each search result displays the title, the URL, and a text fragment containing your search terms in bold.</p>
<p>Consider what happens when you search for &#8216;Apache&#8217; at Google.  Your results would include the Apache server, the Apache Software Foundation, the Apache Helicopter, and Apache County.   The contextual fragments displayed with each search result helps you judge if a search result is an appropriate match and if you need to add additional search terms to narrow the search result space.  Search would not be as user friendly as it is today without these fragments.</p>
<p>This post covers version 2.4.1 of <a href="http://lucene.apache.org/java/docs/index.html" target="new">Apache Lucene</a>, the popular open source search engine library written in Java.  It may not be widely known, but Lucene provides a way to generate these contextual fragments so your system can display them with each search result.  The functionality is not found in lucene-core-2.4.1.jar but in the contrib library lucene-highlighter-2.4.1.jar.  The contrib libraries are included with the Lucene download and are located in the contrib folder once the download is unzipped.</p>
<p>If you are not familiar with Lucene, you can think of it as a library which provides</p>
<ul>
<li> a way to create a search index from multiple text items</li>
<li> a way to quickly search the index and return the best matches.</li>
</ul>
<p>A more thorough explanation of Lucene can be found at the <a href="http://wiki.apache.org/lucene-java/LuceneFAQ" target="new">Apache Lucene FAQ</a>.</p>
<p>As an example of what the Lucene Highlighter can do, here is what appears when I search for &#8216;queue&#8217; in an index of PDF documents.</p>
<blockquote><p>e14510.pdf<br />
Oracle Coherence Getting Started Guide<br />
of the ways that Coherence can eliminate bottlenecks is to <strong>queue</strong> up  transactions that have occurred&#8230;<br />
duration of an item within the <strong>queue</strong> is configurable, and is referred to as the  Write-Behind Delay. When data changes, it is added to the write-behind <strong>queue</strong> (if it is  not already in the <strong>queue</strong>), and the <strong>queue</strong> entry is set to ripen after the configured  Write-Behind Delay has passed&#8230;</p></blockquote>
<p><strong>The Steps</strong><br />
First, before you can display highlighted fragments with each search result, the text to highlight must be available.  Shown below is a snippet of indexing code.  We are storing the text that will be used to generate the fragment in the contents field.</p>
<p><pre class="brush: java;">
Document doc = new Document();
doc.add(new Field(&quot;contents&quot;, contents, Field.Store.COMPRESS, 
    Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field(&quot;title&quot;, bookTitle, Field.Store.YES, 
    Field.Index.NOT_ANALYZED));
doc.add(new Field(&quot;filepath&quot;, f.getCanonicalPath(), Field.Store.YES, 
    Field.Index.NOT_ANALYZED));
doc.add(new Field(&quot;filename&quot;, f.getName(), Field.Store.YES, 
    Field.Index.NOT_ANALYZED));
writer.addDocument(doc);  
</pre></p>
<p>The values Field.Store.COMPRESS or Field.Store.Yes tell Lucene to store the the field in the index for later retrieval with a doc.get() invocation.</p>
<p>Field.Store.COMPRESS causes Lucene to store the contents field in a compressed form in the index.  Lucene automatically uncompresses it when it is retrieved.</p>
<p>Field.Index.ANALYZED indicates the field is searchable and an Analyzer will be applied to its contents.  An example of an Analyzer is StandardAnalyzer.  One of the things done by StandardAnalyzer is to remove stopwords (a, as, it, the, to, &#8230;) from the text being indexed.</p>
<blockquote><p>
Note: You should use the same analyzer type (like StandardAnalyzer) for your indexing and searching operations otherwise you will not get the results you are seeking.
</p></blockquote>
<p>Last part of the indexing side is the TermVectors.  From the <a href="http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/document/Field.TermVector.html#YES" target="new">Lucene Javadocs</a>:<br />
&#8220;A term vector is a list of the document&#8217;s terms and their number of occurrences in that document.&#8221;</p>
<p>For the Highlighter, TermVectors need to be available and you have a choice of either computing and storing them with the index at index time or computing them as you need them when the search is performed.  Above,  Field.TermVector.WITH_POSITIONS_OFFSETS indicates were are computing and storing them in the index at index time.</p>
<p>With the index ready for presenting contextual fragments, lets move on to generating them while processing a search request.  Below is a typical “Hello World” type search block.</p>
<p><pre class="brush: java;">
QueryParser qp = new QueryParser(“contents”, analyzer);
Query query = qp.parse(searchInput);
TopDocs hits = searcher.search(query, 10);

for (int i = 0; i &lt; hits.scoreDocs.length; i++) {
	int docId = hits.scoreDocs[i].doc;
	Document doc = searcher.doc(docId);
	String filename = doc.get(&quot;filename&quot;);
	String contents =  doc.get(“contents”);
	
	String[] fragments = hlu.getFragmentsWithHighlightedTerms(analyzer, 
                   query, “contents”, contents, 5, 100);
}
</pre></p>
<p>Starting at the top, we create a query based on the user supplied search string, searchInput, using the QueryParser.  Lucene supports a sophisticated <a href="http://lucene.apache.org/java/2_4_1/queryparsersyntax.html" target="new">query language</a> and QueryParser simplifies transforming the supplied string to a query object.  Next, we get the top 10 results matching the query.  This is pretty standard so far, but now in the loop we come to the getFragmentsWithHighlightedTerms call.</p>
<p>Here is the code to generate the fragments:<br />
<pre class="brush: java;">
TokenStream stream = TokenSources.getTokenStream(fieldName, fieldContents, 
                      analyzer);
SpanScorer scorer = new SpanScorer(query, fieldName,
				new CachingTokenFilter(stream));
Fragmenter fragmenter = new SimpleSpanFragmenter(scorer, 100);
		
Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(fragmenter);		
String[] fragments = highlighter.getBestFragments(stream, fieldContents, 5);
</pre></p>
<p>First we obtain the TokenStream.  The call shown above assumes term vectors were not stored in the index at index time.</p>
<p>Next is the SpanScorer and SimpleSpanFragmenter.  These work to break the contents into 100 character fragments and rank them by relevancy.  You can use SpanScorer and SimpleSpanFragmenter or QueryScorer and SimpleFragmenter.  The full details can be found in the Javadocs.</p>
<blockquote><p>
Note: when indexing large files, like the full contents of <a href="http://www.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/" target="new">PDF manuals</a>, you might need to tell the Highlighter object to look at the full text by calling the setMaxDocCharsToAnalyze method with Integer.MAX_VALUE or a more appropriate value.   In my case, the default value specified by Lucene was too small, thus Highlighter did not look at the full text to generate the fragments.  This was not good because the match I was seeking was near  the end of the contents.
</p></blockquote>
<p>Finally, we tell the Highlighter to return the best 5 fragments.  </p>
<p>The full code for this example can be downloaded from <a href="http://code.google.com/p/hrycan-blog/" target="new">my Google Code project</a>.  The source file that makes use of the Highlighter is <a href="http://code.google.com/p/hrycan-blog/source/browse/trunk/lucene-highlight/src/com/hrycan/search/HighlighterUtil.java" target="new">HighligherUtil.java</a></p>
<p>You can also find examples of using Highlighter in the Lucene SVN Repository, specifically <a href="http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_4/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java?view=markup" target="new">HighlighterTest.java</a></p>
<p>As you can see, returning search results with contextual fragments containing your search terms is very easy with the Lucene Highlighter contrib library once you know the steps to follow.  </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/38/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=38&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2009/10/25/lucene-highlighter-howto/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>
	</item>
		<item>
		<title>Extracting content from XHTML using XPATH and dom4j</title>
		<link>http://hrycan.com/2009/10/17/extracting-content-from-xhtml-using-xpath-and-dom4j/</link>
		<comments>http://hrycan.com/2009/10/17/extracting-content-from-xhtml-using-xpath-and-dom4j/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 03:25:58 +0000</pubDate>
		<dc:creator>Nick Hrycan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dom4j]]></category>
		<category><![CDATA[namespace]]></category>
		<category><![CDATA[parse]]></category>
		<category><![CDATA[xhtml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://hrycan.com/?p=28</guid>
		<description><![CDATA[If you need to read, write, navigate, create or modify XML documents, take a look at dom4j. Browsing the dom4j cookbook and quick start guide, it seems trivial to extract content from an XML document using XPATH. Consider the following XML document: Listing 1: Sample XML You would think the XPATH to extract the contents [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=28&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you need to read, write, navigate, create or modify XML documents, take a look at dom4j.  Browsing the dom4j <a href="http://www.dom4j.org/dom4j-1.6.1/cookbook.html" target="newwin">cookbook</a> and <a href="http://www.dom4j.org/dom4j-1.6.1/guide.html" target="newwin">quick start guide</a>, it seems trivial to extract content from an XML document using XPATH.  Consider the following XML document:<br />
<pre class="brush: xml;">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
&lt;head&gt;
	&lt;title/&gt;
&lt;/head&gt;
&lt;body&gt;
	&lt;div&gt;
		&lt;p&gt;Paragraph 1&lt;/p&gt;
	&lt;/div&gt;
	&lt;div&gt;
		&lt;p&gt;Paragraph 2&lt;/p&gt;
	&lt;/div&gt;
	&lt;div&gt;
		&lt;p&gt;Paragraph 3&lt;/p&gt;
	&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre><br />
<b>Listing 1: Sample XML</b></p>
<p>You would think the XPATH to extract the contents of each paragraph would be<br />
<pre class="brush: java;">
String xpathExpr = &quot;//div/p&quot;;
</pre></p>
<p>If you try it out, you will see there are no matches.  The gotcha is the namespace<br />
<pre class="brush: xml;">
xmlns=&quot;http://www.w3.org/1999/xhtml&quot;
</pre></p>
<p>If that namspace was not specified, then the above XPATH expression would work.  The solution is to use an alternate set of API calls than what is shown in the standard XPATH examples in the dom4j documentation.<br />
<pre class="brush: java;">
package com.hrycan.blog.xml;

import java.util.List;
import java.util.Map;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Node;
import org.dom4j.XPath;

public class XPATHProcessor {

	private Map&lt;String, String&gt;  namespaceURIMap;
	
	public List&lt;Node&gt; extract(String content, String xpathExpr) throws DocumentException {
		Document document = DocumentHelper.parseText(content);		
		XPath path = DocumentHelper.createXPath(xpathExpr);
		path.setNamespaceURIs(namespaceURIMap);
		
		List&lt;Node&gt; list = path.selectNodes(document.getRootElement());
		return list;
	}

	public void setNamespaceURIMap(Map&lt;String, String&gt; namespaceURIMap) {
		this.namespaceURIMap = namespaceURIMap;
	}
}
</pre><br />
Here we use the DocumentHelper object to create an XPath object and then set it&#8217;s namespace URIs map instead of using the boilerplate document.selectNodes(xpathExpr) call.</p>
<p>Here is the JUnit test showing how it would be used with the content of Listing1 and the corresponding XPATH expression using the namespace.<br />
<pre class="brush: java;">
package com.hrycan.blog.xml;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.dom4j.DocumentException;
import org.dom4j.Node;
import org.junit.Test;
import static org.junit.Assert.assertTrue;

public class XPATHProcessorTest {
	private XPATHProcessor xpathProcessor;
	
	@Test
	public void testExtract() throws DocumentException {
		xpathProcessor = new XPATHProcessor();
		Map&lt;String, String&gt;  namespaceURIMap = new HashMap&lt;String, String&gt; ();
		namespaceURIMap.put(&quot;html&quot;, &quot;http://www.w3.org/1999/xhtml&quot;);
		xpathProcessor.setNamespaceURIMap(namespaceURIMap);
		
		String content = //content of Listing1 shown above		
		String xpathExpr = &quot;//html:div/html:p&quot;;
		
		List&lt;Node&gt; list = xpathProcessor.extract(content, xpathExpr);
		assertTrue(list.size() == 3);
	}	
}
</pre></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hrycan.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hrycan.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hrycan.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hrycan.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hrycan.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hrycan.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hrycan.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hrycan.wordpress.com/28/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hrycan.com&amp;blog=9379971&amp;post=28&amp;subd=hrycan&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hrycan.com/2009/10/17/extracting-content-from-xhtml-using-xpath-and-dom4j/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/af541810432de77586f96fedb1f2c168?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hrycan</media:title>
		</media:content>
	</item>
	</channel>
</rss>
