« | Main | »

November 15, 2009

The parsing gem is an attempt to make everyday parsing tasks syntactically a little less of a pain in the balls. Right now, it offers 3 methods in the Parsing module, and here is an example:
  require 'rubygems'
  require 'parsing'
  
  # Printing all the hosts
  Parsing::logfiles { |e| puts e.host }
  
  # Print all the 'Url' elements
  Parsing::xml('//Url/') { |e| puts e }
  
  # Print all the lines
  Parsing::scan { |e| puts e }
  
  # Capture all the numbers in the lines
  Parsing::scan(/(\d+)/) { |res| puts res[0] }
The motivation is to strip away as much syntax as possible for tasks that require reading files or remote URLs, and involve pattern matching on that input.All these methods take as input files or URLs (these are optional, if they aren't provided, the command line arguments are used) and this input; then yield to a block of code -- that block would depend on the type of input you're reading. They, also, all return the pieces of input that were not used. For example, suppose you want to print all the user agents in some server log files, then your program would be this
  Parsing::logfiles { |e| puts e.agent }
You could then invoke this with arguments that were files and/or URLs. This is simply a convenience because I tend to write similar things over and over and over again. So,
  • Parsing::scan regex? files?: Parse the command line arguments, or files if given, and matches on regex, if given; if no regex is given we simply yield to the line. Example:
      Parsing::scan { |line| puts line }       # Every line
      Parsing::scan(/(\d+)/) { |r| puts r[0] } # Every integer
    
  • Parsing::xml xpath? files?: Parse the command line arguments, or files if given, and matches on xpath, if given; if no xpath expression is given we use //* to match all nodes. Example:
      Parsing::xml { |n| puts n}               # Every node
      Parsing::xml('//Url/') { |e| puts e }    # Url nodes
    
  • Parsing::logfiles files?: Parse the command line arguments, or files if given, and yield to a block taking an object with the following attributes:
    • host
    • logname
    • date
    • method
    • url
    • code
    • size
    • ref
    • agent
    Example:
      Parsing::logfiles { |e| puts e.agent }    # All user agents
    


Posted by jeff at November 15, 2009 05:56 PM