WireWorld » Hacks » Wirehead on Hacking » ruby-xml-smart in Rm

ruby-xml-smart in Rm

I'm not done yet... so my benchmark that used to take 7.3 seconds now takes 5.2 seconds (so, only 40% faster, but that does include the time spent loading all of the libraries) but I'm part way through removing all of the REXML out of my formatting code in Rm and replacing it with Ruby-XML-Smart The nice part is that I can turn any existing REXML generation code that I haven't gotten to yet into a string and then parse it back on the ruby-xml-smart side. So I expect I'll get even better speed out of it soon enough. This is only a few hours of effort right now.

See, I started out using libxml-ruby for the formatting system. But they had neglected to add certain features to the library at the beginning (like namespaces) and I found that various things were just not implemented... and when I tried to hack the functionality in, it started having memory problems. So I went to REXML which I figured might be slow because it's pure ruby... but at least it would work.

It turns out that REXML is, in fact, astonishingly slow, which I covered for by adding a caching layer.

The part that really sucks is that it forces you, to get even reasonable performance, to write code in a fairly baroque fashion. You end up iterating over pieces of XML yourself, which tends to make XML handling functions a little big. Now, if you were writing a script to parse config files or whatnot, I don't think this would be a huge problem. But for parsing pages that need to be returned to the user fairly quickly... this becomes a problem and really isn't what REXML is intended for, nice as it can be.

So now I'm switching over to ruby-xml-smart, which is a newer and more ruby-tastic attempt to wrap libxml for Ruby. And it works. I've got some issues with it, at the moment. I wrote two emails to the author of it before I realized I had answered my own questions and then deleted the messages unsent.

It needs some polish. The documentation's not there yet, so I spent a lot of time using the source. But the idea seems to be sound, no mysterious crashes, and most of the functionality right out there.

I have to remind myself that things that are fairly expensive operations with REXML (like parsing an XML document from a string or following an XPath expression) are not nearly as bad when optimized C code is in use instead.

My gripes so far are:

None of these flaws are really showstoppers. Just some code is a little clunkier than I'd like. On the other hand, the overall clunkyness of the code has done down once you take into account all of the REXML workarounds...