Using Hpricot to Scrub HTML

I went looking for a Ruby replacement for Html::Scrubber in perl for a gig and came up blank. Can it really be possible the nobody is doing anything more than blindly stripping tags?

I had seen Hpricot and thought I needed to find a reason to use it, well here it is. I monkey patched a couple methods into Hrpicot and off I went.

Here’s the Hpricot bits.

module Hpricot
  class Elements
    def strip
      each { |x| x.strip }

    def strip_attributes(safe=[], patterns={})
      each { |x| x.strip_attributes(safe, patterns) }

  class Elem
    def strip
      parent.replace_child self, Hpricot.make(inner_html) unless

    def strip_attributes(safe=[], patterns={})
      attributes.each { |atr|
          pat = patterns[atr[0].to_sym] || ''
          remove_attribute(atr[0]) unless safe.include?(atr[0]) &&
      } unless attributes.nil?

Just that bit get’s me to the point where I can do things like this

doc = Hpricot(open('').read)

# remove all anchors leaving behind the text inside.

# strip all attributes except for src from all images

Then I made scrubber that passes in the array and hash to those methods to handle the dirty work. It looks like this, though I’m also using Tidy so mine is alittle different.

class HtmlScrubber
  @@config = YAML.load_file(
    "#{RAILS_ROOT}/config/html_scrubber.yml") unless

  def self.scrub(markup)
    doc = Hpricot(markup || '', :xhtml_strict => true)
    raise 'No markup specified' if doc.nil?
    @@config[:nuke_tags].each { |tag| (doc/tag).remove }
    @@config[:allow_tags].each { |tag|
        @@config[:attribute_patterns]) }
    doc.traverse_all_element {|e|
      e.strip unless @@config[:allow_tags].include?(

Here is a zip of the code and a sample config:

Profiling Rails end-to-end

I wanted to do some profiling of a Rails app, so I did a little digging and found ruby-prof with new and improved call graphs. Plus it’s very fast. The install couldn’t be easier

sudo gem install ruby-prof```
Then I wanted to see if I could get this to run in before and after filters, I haven't had any luck, though I haven't tried all that hard. Since I wanted to be able to do this relatively easily I threw together a mini module to handle the report generation piece for me. So now I can profile a controller action by adding this to my application controller

require 'ruby_profiler'

class ApplicationController < ActionController::Base
  include RubyProfiler

Then in the controller I just need to

def some_action
  result = RubyProf.profile {
  write_profile(result, 5, RubyProfiler::GRAPH_HTML)

source: <a href="/dropbox/ruby_profiler.rb">ruby_profiler.rb</a>

mmm Feeds

Ok, so the project I’ve been workig on is getting close…

Feed Harvest if you are interested in the (very) private beta, let us know.


Finally got off my arse and upgraded the local wordpress installs to v2. Seems goood so far and Mo is happy so it must be good.

Hopefully I don’t see any of the issues that Om saw.