So I stumbled onto Devalot, and I have to say I like what I see. This thing has the potential to kick some butt. I really like the blog aggregation to the front page. If they go the right direction with the source integration this thing will replace Trac and Wordpress on my stuff no problem.
I just wish I had more “free-time” to help out with it, new gigs tend to eat a lot of time.
[UPDATE 2007-02-07] Changed scrub to return self [/UPDATE]
Using Hpricot to Scrub HTML - The remix
So I wanted to bring the HTML Scrubber into my Hpricot tweaks to tidy it up a bit and this is what I ended up with.
Now you can use the following to remove all tags from an HTML snippet
doc = Hpricot(open('http://slashdot.org/').read)
doc.scrub
Strip all hrefs, leaving the text inside in tact
(doc/:a).strip
Scrub the snippet based on a config hash
doc.scrub(hash)
hpricot_scrub.rb
require 'hpricot'
module Hpricot
class Elements
def strip
each { |x| x.strip }
end
def strip_attributes(safe=[])
each { |x| x.strip_attributes(safe) }
end
end
class Elem
def remove
parent.children.delete(self)
end
def strip
children.each { |x| x.strip unless x.class == Hpricot::Text }
if strip_removes?
remove
else
parent.replace_child self, Hpricot.make(inner_html) unless parent.nil?
end
end
def strip_attributes(safe=[])
attributes.each {|atr|
remove_attribute(atr[0]) unless safe.include?(atr[0])
} unless attributes.nil?
end
def strip_removes?
# I'm sure there are others that shuould be ripped instead of stripped
attributes && attributes['type'] =~ /script|css/
end
end
class Doc
def scrub(config={})
config = {
:nuke_tags => [],
:allow_tags => [],
:allow_attributes => []
}.merge(config)
config[:nuke_tags].each { |tag| (self/tag).remove }
config[:allow_tags].each { |tag|
(self/tag).strip_attributes(config[:allow_attributes])
}
children.reverse.each {|e|
e.strip unless e.class == Hpricot::Text ||
config[:allow_tags].include?(e.name)
}
self
end
end
end
Sample config in YAML
---
:allow_tags: # let these tags stay, but will strip attributes
- 'b'
- 'blockquote'
- 'br'
- 'div'
- 'h1'
- 'h2'
- 'h3'
- 'h4'
- 'h5'
- 'h6'
- 'hr'
- 'i'
- 'em'
- 'img'
- 'li'
- 'ol'
- 'p'
- 'pre'
- 'small'
- 'span'
- 'span'
- 'strike'
- 'strong'
- 'sub'
- 'sup'
- 'table'
- 'tbody'
- 'td'
- 'tfoot'
- 'thead'
- 'tr'
- 'u'
- 'ul'
:nuke_tags: # completely removes everything between open and close tag
- 'form'
- 'script'
:allow_attributes: # let these attributes stay, strip all others
- 'src'
- 'font'
- 'alt'
- 'style'
- 'align'
The source with sample data/test, run the test with
ruby test
[UPDATE 2007-02-07] I realized I left some extra junk in the version of Util in the zip, it’s been updated [/UPDATE]
I have a rake task and a Util class that I use to make setting up required gems painless and to be sure that I’m always running the versions I think I am.
Install or update required gems
rake gems:install
Make sure they are loaded with the right versions during startup, by adding the following to environment.rb
Util.load_gems
This uses a config file that looks like
:source: http://local_mirror.example.com # this is optional
:gems:
- :name: mongrel
:version: "1.0"
# this gem has a specfic source URL
:source: 'http://mongrel.rubyforge.org/releases'
- :name: hpricot
:version: '0.4'
# this tells us to load not just install
:load: true
- :name: postgres
:version: '0.7.1'
:load: true
# any extra config that needs to be passed to gem install
:config: '--with-pgsql-include-dir=/usr/local/pgsql/include
--with-pgsql-lib-dir=/usr/local/pgsql/lib'
Here’s the Util class
require 'yaml'
class Util
def self.load_gems
config = YAML.load_file(
File.join(RAILS_ROOT, 'config', 'gems.yml'))
gems = config[:gems].reject {|gem| ! gem[:load] }
gems.each do |gem|
require_gem gem[:name], gem[:version]
require gem[:name]
end
end
end
Here’s the rake task
require 'yaml'
namespace :gems do
require 'rubygems'
task :install do
# defaults to --no-rdoc, set DOCS=(anything) to build docs
docs = (ENV['DOCS'].nil? ? '--no-rdoc' : '')
#grab the list of gems/version to check
config = YAML.load_file(File.join('config', 'gems.yml'))
gems = config[:gems]
gems.each do |gem|
# load the gem spec
gem_spec = YAML.load(`gem spec #{gem[:name]} 2> /dev/null`)
gem_loaded = false
begin
gem_loaded = require_gem gem[:name], gem[:version]
rescue Exception
end
# if forced
# or there is no gem_spec
# or the spec version doesn't match the required version
# or require_gem returns false
# (return false also happens if the gem has already been loaded)
if ! ENV['FORCE'].nil? ||
! gem_spec ||
(gem_spec.version.version != gem[:version] && ! gem_loaded)
gem_config = gem[:config] ? " -- #{gem[:config]}" : ''
source = gem[:source] || config[:source] || nil
source = "--source #{source}" if source
ret = system "gem install #{gem[:name]}
-v #{gem[:version]} -y #{source} #{docs} #{gem_config}"
# something bad happened, pass on the message
p $? unless ret
else
puts "#{gem[:name]} #{gem[:version]} already installed"
end
end
end
end
Just a quick announcement, FCKeditor on Rails will run in Rails 1.2 as a plugin (with a little help), more info on the blog or in trac.
Jamis Buck has shed a little light on figuring out WTF that Ruby process eating all your processor is actually doing.
Alright, maybe not quite the same as sliced bread, but very nice none-the-less.
I can’t tell you how many times I could have used this, now I just need to wait for the need to pop up again.
[UPDATE] Apparently it get’s better than this, much better