I'm trying to parse an URL with Boost.Spirit based on
this [w3schools.com] format. The problem is that when I type the domain on its own (like google.com) it doesn't work. My guess is that because the host is defined as a word followed by a dot, the parser thinks I've written the host and half the domain name. How can I fix this?
GeSHi (cpp):
#include <iostream>
#include <string>
#include <boost/spirit.hpp>
using namespace std;
using namespace boost::spirit;
bool parseUrl(const string& str)
{
rule<> word_p = +alpha_p;
rule<> scheme_p = word_p >> str_p("://"); // http://
rule<> host_p = word_p >> ch_p('.'); // www.
rule<> domain_p = word_p >> ch_p('.') >> word_p; // google.com
rule<> port_p = ch_p(':') >> uint_p; // :80
rule<> path_p = ch_p('/') % word_p; // /path/to/file/
rule<> filename_p = word_p >> !(ch_p('.') >> word_p); // logo.gif (extension optional)
// Optional scheme, optional host, domain, optional port, path and filename option. Can have path without filename, but not filename without path.
rule<> url_p = !scheme_p >> !host_p >> domain_p >> !port_p >> !(path_p >> !filename_p);
return parse(str.c_str(), url_p, space_p).full;
}
int main()
{
bool quit = false;
while(!quit)
{
string str;
cin >> str;
if(str == "quit")
quit = true;
else
cout << parseUrl(str) << endl;
}
return 0;
}Created by GeSHI 1.0.7.18
Thanks.
Edit:
If you've done BNF grammar before, but not used Spirit, the rule<>'s are regular grammar rules. The = operator is used instead of ::=, the ! operator means the rule following it is optional, >> means sequence (a>>b in Spirit is a b in BNF), ch_p represents a character and +alpha_p means one or more of any alphabetical character.