C++ Learning Community Forum
August 01, 2010, 03:21:33 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Hello. Smiley
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Boost.Spirit URL parsing  (Read 1360 times)
biggoron
C++ Freak
***
Posts: 351


View Profile
« on: June 08, 2009, 06:36:51 PM »

I'm trying to parse an URL with Boost.Spirit based on this [w3schools.com] format. The problem is that when I type the domain on its own (like google.com) it doesn't work. My guess is that because the host is defined as a word followed by a dot, the parser thinks I've written the host and half the domain name. How can I fix this?

Code
GeSHi (cpp):
#include <iostream>
#include <string>
#include <boost/spirit.hpp>
 
using namespace std;
using namespace boost::spirit;
 
bool parseUrl(const string& str)
{
   rule<> word_p = +alpha_p;
   rule<> scheme_p = word_p >> str_p("://");    // http://
   rule<> host_p = word_p >> ch_p('.');             // www.
   rule<> domain_p = word_p >> ch_p('.') >> word_p;   // google.com
   rule<> port_p = ch_p(':') >> uint_p;               // :80
   rule<> path_p = ch_p('/') % word_p;               // /path/to/file/
   rule<> filename_p = word_p >> !(ch_p('.') >> word_p);   // logo.gif (extension optional)
 
   // Optional scheme, optional host, domain, optional port, path and filename option. Can have path without filename, but not filename without path.
   rule<> url_p = !scheme_p >> !host_p >> domain_p >> !port_p >> !(path_p >> !filename_p);
 
   return parse(str.c_str(), url_p, space_p).full;
}
 
int main()
{
   bool quit = false;
 
   while(!quit)
   {
       string str;
       cin >> str;
 
       if(str == "quit")
           quit = true;
       else
           cout << parseUrl(str) << endl;
   }
 
   return 0;
}
Created by GeSHI 1.0.7.18

Thanks.

Edit:
If you've done BNF grammar before, but not used Spirit, the rule<>'s are regular grammar rules. The = operator is used instead of ::=, the ! operator means the rule following it is optional, >> means sequence (a>>b in Spirit is a b in BNF), ch_p represents a character and +alpha_p means one or more of any alphabetical character.
« Last Edit: June 08, 2009, 10:27:58 PM by biggoron » Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!