Semergence

Seth Ladd’s blog about Ruby on Rails and crunching data.

Producing Aloha on Rails, The Hawaii Ruby on Rails and Web Development Conference

leave a comment »

I love Ruby on Rails, I love bringing people together to talk about web development and engineering, and I love Hawaii. I wanted to do something amazing, something big, that could bring Ruby on Rails to Hawaii in style and share the Aloha of our Islands with the Rails and Web Development communities. I’m proud to announce that I am organizing and producing Aloha on Rails, the Hawaii Ruby on Rails and Web Development Conference. The conference will take place October 5-6, 2009 at the beautiful Marriott Waikiki on Oahu.

The Aloha on Rails Conference is the premier destination event for Ruby on Rails and Web Development. This unique, two day event unites the community’s top speakers and talent with motivated and excited attendees for an unforgettable conference in beautiful Hawaii. Two full days of informative and timely sessions covering Ruby on Rails and the future of web application engineering. The sessions are not simply blog posts, but will be full of wisdom, experience, lessons learned, war stories, and panel discussions, discussions, and most importantly debate.

In producing this event, it was very important to me to organize an itinerary that appealed to all web developers, not just those loyal to Ruby on Rails. I believe the Rails community has a lot to share and teach, but it can listen, too. A successful conference, in my mind, is one that truly generates thought provoking debate. One way to ensure this is to seed the pool with differing view points. A healthy discussion benefits everyone. To this end we are bringing in speakers that will address a wide range of issues and topics relevant to all web developers that are motivated and take their craft seriously (but know how to enjoy it.) The conference is certainly centered around Rails, but will extend outwards to topics such as Erlang, business models, Map Reduce, JRuby, testing, and simplicity.

Hosting the conference in Hawaii is a gutsy move, I know. Convincing people that it’s very affordable to get out here is not easy. Luckily, due to the current economic situation, it’s actually cheaper to fly and stay in Hawaii than it’s ever been (factoring in inflation.) You’ll be amazed at the deals the airlines and hotels are cutting to get you out here. True, a lot of the deals only reach out through summer, but nothing indicates these deals will stop any time soon. Hawaii almost sells itself, which helps, and who doesn’t want to come out to Hawaii, geek out with some of the best talent in the community, relax, and recharge? We have a large list of hotels and travel tips on the Aloha on Rails Venue and Travel page.

I’ve been extremely fortunate and humbled to get some amazing speakers to come out for Aloha on Rails. For instance, Chad Fowler will present the keynote. Other speakers currently booked include Obie Fernandez, Charles Nutter, Gregg Pollack, Anthony Eden, Tim Dysinger, Desi McAdam, Scott Chacon, Giles Bowkett, and Jeremy McAnally. This list is going to grow very quickly, so keep an eye on the Aloha on Rails Speaker List.

The Call for Participation is now open and runs through July, so please submit a talk proposal. There’s still time and room for your presentation!

There are numerous Sponsorship Packages available, at many different levels. This is your chance to be associated with the premier destination event for Ruby on Rails and Web Development, and directly target and reach a captive, motivated audience of Rails and web developers, designers, team leads, and managers.

This conference a unique and exciting opportunity to learn, share, and immerse yourself in Rails and Web Development with other motivated and friendly attendees and speakers with enjoying beautiful Hawaii. I’m thrilled to be able to bring the Rails and Web Development communities to Hawaii.

Written by sethladd

May 5, 2009 at 11:51 pm

Posted in Uncategorized

Ye Olde Scala Presentation to Honolulu Coders

leave a comment »

I presented the wonderfully named “Scala: Java, Erlang, and Ruby’s Hot Three Way Love Child” Scala presentation to the Honolulu Coders back in 2007. I went looking for the actual presentation online, and I’m not sure I ever posted it.

I had a brief fling with Scala as I was looking for multi-core friendly environments to build data processing frameworks. I came away being very impressed and look forward to deploying Scala in a future project. After Erlang and Scala, I started programming in a functional style back over in my Ruby code. I consider the experiments a win for that fact along.

This post is to prove that I knew Scala before it was cool. :P

Written by sethladd

April 19, 2009 at 9:08 pm

Posted in Programming

Tagged with ,

Extending Hadoop Pig for Hierarchical Data

leave a comment »

I’ve been playing with Hadoop Pig lately, and having a fun time.  Pig is an easy to use language for writing Map Reduce jobs against Hadoop.

Our data is very hierarchical, and we calculate a lot of aggregates for self nodes, their children nodes, and self plus children.  We have a few tricks up our sleeves for SQL for handling these types of aggregates, but of course with Map Reduce an entirely new way of thinking is required.

Luckily, Pig allows for easily created User Defined Functions (UDFs) that extend the Pig language.  I was able to take an existing Pig UDF, TOKENIZE, and alter it to suite my needs.

Specifically, our data looks like this:

111,/A/B/C
222,/A/B
333,/A/B/C

We need to answer questions such as “How many records for A and all of its children?” In this case, the answer is three. We also need to answer “How many records for just A?” which is zero, or “for just C?” which is two.

Our strategy is to take the path (eg /A/B/C) and split it into all the different paths contained within. For example, /A/B/C can be split into:

  • /A
  • /A/B
  • /A/B/C

Now, we can take the original record (eg 111) and attribute it to all three paths that stem from the original /A/B/C. Once we have all paths, we can perform aggregate calculates the way Map Reduce wants to (which basically is sorted (grouped by) and then counted or summed).

To see how this works, simply get the source code from Pig, look for the TOKENIZE class. Below are my modifications:

    @Override
    public DataBag exec(Tuple input) throws IOException {
        try {
            DataBag output = mBagFactory.newDefaultBag();
            Object o = input.get(0);
            if (!(o instanceof String)) {
            	int errCode = 2114;
            	String msg = "Expected input to be chararray, but" +
                " got " + o.getClass().getName();
                throw new ExecException(msg, errCode, PigException.BUG);
            }
            StringTokenizer tok = new StringTokenizer((String)o, (String)input.get(1), false);
            StringBuilder sb = new StringBuilder();
            while (tok.hasMoreTokens()) {
            	sb.append((String)input.get(1));
            	sb.append(tok.nextToken());
                output.add(mTupleFactory.newTuple(sb.toString()));
            }
            return output;
        } catch (ExecException ee) {
            throw ee;
        }
    }

The end result now looks like this:

111,/A
111,/A/B
111,/A/B/C
222,/A
222,/A/B
333,/A
333,/A/B
333,/A/B/C

To learn more about writing your own Pig UDFs, consult the Pig UDF Manual.

Written by sethladd

April 13, 2009 at 1:50 am

Posted in Programming

Tagged with , ,

Mac OS X, Hadoop 0.19.1, and Java 1.6

with one comment

If you’re excited, like I am, about Amazon’s recent announcement that they are now offering Elastic Map Reduce you probably want to try a quick Hadoop MapReduce application to test the waters. I found out quickly that if you are on a Mac (as I am) you’ll need to perform a few quick configurations before things work correctly. Below is what I needed to do to get Hadoop running on my Mac with Java 1.6.

This post assumes you are running the latest Mac OS X 10.5 with all updates applied.

Enable Java 1.6 Support

To enable Java 1.6, open up the Java Preferences application. This can be found in /Applications/Utilities/Java Preferences.

java-preferences

You will need to drag Java 1.6 up and place at the top of both the applet and application versions.

Open up a terminal and type java -version and you should see something like the following:


java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

Configure JAVA_HOME

This was the tough part. Once you have Java 1.6 as the default runtime, you need to configure your JAVA_HOME environment variable to reflect this as well. Hadoop wants JAVA_HOME set before it will run correctly, otherwise you’ll see:


Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file

This assumes you are using the standard bash shell. Open up ~/.bash_profile (or create it if it doesn’t exist) and add this line:


export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home

Save the file. Return to the terminal and type . ~/.bash_profile which will set your JAVA_HOME correctly. To verify, you can type: echo $JAVA_HOME.

While you are in .bash_profile you might as well add a HADOOP_HOME variable, which will make running hadoop from the command line much easier. Below is my full .bash_profile:


export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
export HADOOP_HOME=~sethladd/Development/hadoop-0.19.1

export PATH=$HADOOP_HOME/bin:$PATH

Note that I also placed $HADOOP_HOME/bin into my path, so now I can type hadoop from anywhere.

Now that you are all setup, try the simple WordCount Hadoop Tutorial.

Written by sethladd

April 2, 2009 at 4:38 pm

Posted in Programming

Tagged with , , ,

Ext JS, Checkboxes, and Ruby on Rails

leave a comment »

I am currently building a web application that makes heavy use of wizards and forms built with Ext JS. I am sending the form’s values as JSON over to a Ruby on Rails server. Thanks to Rails 2.2.2, the JSON is automatically converted into an parameter hash.

For a little background to the problem I am about to describe, HTML checkboxes are notoriously difficult to work with, because if the user does *not* check the box, then the browser will not send a value when the form is submitted. This makes it difficult to handle a state change from checked to unchecked.

Rails makes this a bit easier because it will render both a hidden field with a value of “off” as well as the original checkbox. If the user checks the box, both values are sent but Rails will use the value from the checkbox (and not the hidden field). If the user does not check a box, only the hidden field’s value is sent, which means the “off” value is sent.

However, when using Ext JS to render the form and its checkboxes, no hidden “off” field is rendered. Our first inclination is to extend the checkbox component to also render a hidden “off” field, much like Rails. However, because I am serializing the form as JSON, this strategy breaks down because the serialization will create an array of values for the checkbox and hidden field combo. This is because the fields are both named the same thing (exactly what the Rails code does) and when form serialization occurs, it will create an array if it encounters more than one value for a field name.

Instead of rendering a hidden “off” field for checkboxes, I simply change the way I am serializing the form. After serialization, I loop back through the form, find all the checkboxes, and for those checkboxes that are unchecked, I add an “off” value into my JSON serialization.

The code for this is much more simple than it sounds, and lets me always send checkbox values, regardless if they are checked or unchecked.


            var serializedForm = card.getForm().getValues(false);

            // because unchecked checkboxes do not submit values, manually force a value of 0
            card.getForm().items.each(function(f) {
                if (f.isFormField && f.getXType() == 'checkbox' && !f.getValue()) {
                    serializedForm[f.getName()] = '0';
                }
            });

Written by sethladd

February 19, 2009 at 3:51 pm

Posted in javascript, rubyonrails, web

Rails Builder Is Slow But Easy To Fix

leave a comment »

For the impatient: Go install the fast_xs gem if you use Builder in Rails.

The long story:

I was using New Relic RPM to watch Errorlytics, and noticed everything was humming along just fine, except… the Atom feed of new 404 errors was taking an extremely long time. Request times of 800ms were not uncommon. RPM told me that the view rendering was taking most of the time, which made sense because I had already ensured that all the queries were by index.

Errorlytics uses the Atom Builder to construct the Atom feed, and generally I like it. However, I had my doubts after learning it was the bottleneck, and I was hoping I wouldn’t have to rip it out and write the XML by hand.

After a bit of Googling around, I was lucky enough to run across Speed up your feed generation in Rails. This post details the same issue, in that the Builder had some major performance issues. Thankfully, the author of that post did all the hard work. They learned that installing the fast_xs gem is the quickest and easiest way to speed up Builder output. Fast_xs implements a common bottleneck as native code, with impressive results.

Bottom line, if you’re using Builder in Rails 2.0.2 or later, you should go install fast_xs gem and link it to your Rails application via config.gem 'fast_xs. Your site will thank you.

Written by sethladd

February 1, 2009 at 9:25 pm

Posted in rubyonrails

Errorlytics Captures and Fixes 404 Page Not Found Errors For Drupal, WordPress, PHP, Rails, and Java

leave a comment »

For the past several months, I’ve been working on Errorlytics, a Software as a Service that captures, analyzes, and helps you fix 404 Page Not Found errors for your Drupal, WordPress, PHP, Rails, and Java web sites and applications.

The original idea came from my friends and partners at Accession Media (SEO and Internet Marketing experts) and New Evolutions (web application engineers).

Errorlytics is both a small plugin installed on your web site or application, and a hosted service which captures, processes, aggregates, and fixes 404 page not found errors. If you run a site with Drupal, WordPress, straight PHP, Ruby on Rails, or Java, then Errorlytics can help you.

After installing the open source (and small) plugin, you let the hosted service begin to capture the 404 page not found errors. You can then begin to handle those errors by telling Errorlytics where you want to redirect your visitors. The next time a visitor encounters a 404, they will be transparently redirected to the correct page. They never know that Errorlytics is even involved!

Errorlytics results in a better experience for your visitors, because they will eventually stop seeing 404 responses. Even more importantly, Google and other search engines will notice that your 404 rate is decreasing, which improves your Search Engine Optimization.

As a web site administrator, you can log into Errorlytics to see a Dashboard of unhandled errors, receive daily emails, or even subscribe to an Atom feed. Handling an error and redirecting a user is incredibly simple. All rule configuration is performed using simple English statements, thus no programming is required!

We’re really proud of what we built, and Errorlytics has already handled over 1 Million errors for our clients. There are subscription levels for every size web site, so create an account today. We’re always interested in hearing what we can do to make Errorlytics even better, so please don’t hesitate to let us know what you’d like in the Errorlytics Support Forums.

For those interested in such things, Errorlytics is built with Ruby on Rails and hosted at Slicehost.

Written by sethladd

January 11, 2009 at 8:06 pm

Display Javascript Confirmation When User Leaves a Web Page

with 7 comments

Update: the original code did not work in Internet Explorer 6 or 7. This is because change events for form elements do not bubble in Internet Explorer. That is lame. In any case, the included code has been tested for Firefox 3, Internet Explorer 6 and 7.

Our customer wanted to warn users that if they leave the current page with unsaved changes, those changes will be lost. The requirement is to only show an alert/confirmation when the user has changed the form but did not submit the form. However, don’t forget that if the user changed the form but then clicked Submit, no confirmation or warning should be displayed (as they are saving the changes before they leave the page via the submit.)

The following Javascript is one way to do it. Note that I am using both Prototype and Ext JS in this snippet.


Ext.onReady(function() {
    Ext.namespace('Dses');
    Dses.formChanged = false;
    $$('form.watch-for-changes').each(function(form, index) {
        form.getElements().each(function(element, elIndex) {
            element.observe('change', function() { Dses.formChanged = true; });
        });
    });
    $$('form.watch-for-changes').each(function(form, index) {
        form.observe('submit', function() { Dses.formChanged = false; });
    });
    window.onbeforeunload = function () {
        if (Dses.formChanged) {
            return "You have unsaved changes in your form.  You may wish to save the form before leaving this page.";
        }
    }
});

Note that this requires any form that you want to monitor for changes to be given the class watch-for-changes.

Read up on more details and information on Javascript, the back button, and window.onbeforeunload.

Written by sethladd

January 8, 2009 at 4:52 pm

Posted in html, javascript

CouchDB Ruby Libraries

leave a comment »

This is a quick list of CouchDB Ruby libraries. Need to connect Ruby to CouchDB? Try one of these!

  1. CouchRest – hosted at GitHub, this library by Chris Anderson (jchris), has a low level component and a high level component.
  2. RelaxDB – also hosted at GitHub, this library provides a base class that your models will extend (similar to the high level component provided by CouchRest.) RelaxDB adds pagination support as well as has_many and belongs_to relationship support.
  3. CouchObject – this library is unique because it is implemented as a module, to be included in your model class. This library also has both low level and high level components.
  4. Basic Model – from topfunky, this library was featured in the PeepCode CouchDB episode. Your model classes extend BasicModel.
  5. ActiveCouch – tries to look like ActiveRecord. Supporting a simple find method to query views. Views are defined as Ruby classes, and loaded via rake tasks.
  6. CouchFoo – attempts to replication ActiveRecord look and feel. Newer than the rest of the libraries, so takes lessons learned from all other Ruby libraries.

I’ve personally used CouchRest, but only its low level components.

For much more in depth analysis of many of these libraries, a little place of calm has a great eight part series on writing Ruby on Rails applications with CouchDB which covers ActiveCouch, RelaxDB, and Basic Model.

Written by sethladd

December 31, 2008 at 5:27 pm

Posted in Ruby, couchdb

Reasons Why CouchDB Is Exciting

leave a comment »

I’ve been playing with CouchDB lately, and having a good time getting out of the relational database mindset. I’m not against relational databases, but I do think the time has come to stop thinking they are the only way to store and process data.

There are a few reasons why I think CouchDB is exciting.

  1. Arbitrary JSON document storage. I like this feature not so much because I’m not required to design a rigid schema, but because it implies that I can store arbitrarily complex and deeply nested data structures. This clears the way for easily storing an object graph without the pains of going through the pains of decomposing into a relational schema.
  2. Incremental view rebuilds. This is a killer feature for me, because it implies that a small change in the data does not invalidate the entire view.
  3. Keys can be complex objects. In other words, you can now use an array or hash as a key in a view. For example, a key can be [2008, 01, 12] which represents a date. Not only is this more flexible, but it lends itself to some very useful query methods. For example, you can query for a range of keys just on the first part of the array.

Written by sethladd

December 29, 2008 at 4:40 pm

Posted in coucdb, database, mapreduce