![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Using Open Source Libraries with C++Builder 6.0
By Harold Howe, Big Creek Software, LLCCreator of the bcbdev.com websiteMember of TeamB Co-author of C++Builder How-To Contributor to C++Builder 4 Unleashed by Charlie Calvert and Kent Reisdorph hhowe@bcbdev.com Table of Contents
1: IntroductionThe popularity of open source libraries has soared over the past decade. With commercial software companies demanding more and more money for proprietary systems that seem to do less and less as time goes on, many developers are turning towards open source libraries as a solution to their problems. Open source libraries tend to be free, and by definition, come complete with source code. If you are reading this article, chances are that you are interested in utilizing open source tools in your development. Maybe it is the openness and the freedom that attracted you. Or perhaps you are simply looking for libraries that endeavor to attain a higher degree of quality. Or maybe you are sick of paying for new versions of proprietary software where the only new feature seems to be an armada of new bugs. Regardless of the reason, you are here to read about how to leverage open source technologies in C++Builder. This article focuses on 6 open source projects : Boost, Xerces, Flex, Bison, ACE and TAO, and PCRE. All 6 of these tools are open source and free for commercial and non-commercial use. Furthermore, all of them work well with C++Builder. The focus of this article is not to cover these 6 tools in depth. That would take far too much time. Instead, this article demonstrates how to install and configure the tools for use with C++Builder 6. The article also contains a variety of example projects, but the examples are elementary in nature (hello world type of examples). The article focuses on installation and configuration for a simple reason: many of these tools lack instructions for using them with C++Builder. This article attempts to fill this gap while deferring complex topics to other books and websites. 2: Boost2.1 IntroductionBoost (boost.org) is a collection of C++ libraries. The libraries come in source form and work with a variety of compilers and platforms. The current version of Boost is 1.27.0, although recent additions to Boost source code can be downloaded from the boost Source Forge repository at http://sourceforge.net/projects/boost/. Boost offers a wide variety of utilities. They include: array : a container for fixed size arrays graph : aka 'BGL'. a set of graph algorithms python : maps C++ classes into python regex : a regular expression library smart_ptr : a collection of smart pointer classes static_assert: compile time assertions thread : portable C++ threading library Note that this is not a complete list. Visit http://www.boost.org/libs/libraries.htm for a complete list of libraries 2.2 Installing BoostMany of the Boost libraries are C++ template classes that reside entirely in header files. You can use these libraries by simply extracting the Boost files and configuring the include path in the BCB environment. Some of the Boost libraries need to be compiled into library form. They include the regex, thread and graph libraries. Installation steps
Boost includes a test suite for measuring how well a compiler supports Boost. The results for BCB6 and BCB5 can be found at http://www.bcbdev.com/articles/borlandboost.htm. Because of some problems in the compiler, the following Boost libraries do not work with BCB6 (at the time of this writing, but keep an eye out for BCB6 patches that might fix some of these problems) function : function.hpp fails to compile because BCB6 does not allow default template arguments on a nested class. This is a compiler bug. graph : graph library does not compile with BCB6 thread : boost::thread relies on the boost function library python : boost::python relies on the boost function library regex : the regex library compiles, but access violations occur at runtime. This is presumably the fault of compiler bugs. 2.3 Boost Examples2.3.1 boost::arrayBoost array is a container class that provides an STL like container interface for a statically sized array. It provides begin and end methods for iterating the array. The array container also provides a swap method for efficiently swapping the contents of two arrays, and it proves a subscript operator for accessing elements in the array. For a complete list of member functions, consult the header file boost/array.hpp or visit the online documentation for array at boost.org (http://www.boost.org/libs/array/array.html). The Boost array class fills a void between the STL container classes and ordinary arrays. STL containers grow dynamically. You can add and remove elements at will. As you do, the size of container grows or shrinks with each call. Dynamic containers are more flexible than ordinary arrays, however this flexibility incurs some overhead at runtime. You can avoid this overhead by switching back to static arrays. But arrays don't have the nice interface that the STL containers. The Boost array container bridges this gap. It combines the speed and simplicity of a static array with the convenience of an STL container class. Listing 2.1 shows an example of how to use the Boost array container. //----------------------------------------------------------------------------- //Listing 2.1: array.cpp #include <iostream> using namespace std; #include "boost/array.hpp" struct Foo { int x; int y; char c; }; int main() { boost::array<Foo, 10> a = {{ {0, 0, 'a'}, {1, 1, 'b'}, {2, 4, 'c'}, {3, 9, 'd'} }}; // reset a[1].y a[1].y = 10; cout << "Boost array contains: " << endl; for (boost::array<Foo, 10>::const_iterator iter = a.begin(); iter != a.end(); ++iter) { cout << iter->x << ',' << iter->y << ',' << iter->c << endl; } return 0; } //----------------------------------------------------------------------------- The array class contains two template parameters. The first parameter specifies the type of objects that you want to hold in the array. The second parameter determines the size of the array. The code in Listing 2.1 creates an array that holds ten Foo structures. There are a couple of points worth mentioning regarding the boost array class. First, an array object is always full, just like an ordinary array. There are no methods for adding or removing elements. The size member function always returns the size that you passed as a template argument. Secondly, notice that we use array style initialization when we construct the array. This syntax allows us to directly initialize the values in the array. In order to support this syntax, the array class does not provide any constructors. If you open boost\array.hpp, you will see that indeed the class does not have any constructors. 2.3.2 boost::lexical_castlexical_cast is a template function that converts values into text form. The syntax for lexical_cast resembles static_cast and dynamic_cast. Listing 2.1 demonstrates how to use lexical_cast. //----------------------------------------------------------------------------- //Listing 2.1: lexical_cast.cpp #include <iostream> #include <string> using namespace std; #include "boost/lexical_cast.hpp" int main() { float f = 3.14159; string s; s = boost::lexical_cast<string>(f); cout << "Original float : " << f << endl; cout << "Converted to string : " << s << endl; return 0; } //----------------------------------------------------------------------------- lexical_cast relies on string streams to perform the conversion. It inserts the source value into a temporary stringstream object, and then performs an extraction into the result. In order to use lexical_cast, you must provide operator << for the source type and operator >> for the destination type. The code for lexical_cast looks sort of like this: // A simplified version of lexical_cast without error checking template<typename Target, typename Source> Target lexical_cast(Source arg) { std::stringstream interpreter; Target result; interpreter << arg; interpreter >> result; return result; } 2.3.3 boost::smart_ptrThe smart_ptr library consists of five smart pointer classes: scoped_ptr, scoped_array, shared_ptr, shared_array and weak_ptr. The list below summarizes the purpose of each class. scoped_ptr : like auto_ptr, but never transfers ownership. scoped_ptr objects should not be stored in containers. scoped_array : array version of scoped_ptr. Calls delete []. shared_ptr : reference counted smart pointer. Safe for use in containers. shared_array : array version of shared_ptr. weak_ptr : stores a pointer that is already owned by a shared_ptr. Each Boost smart pointer provides functionality that can't be found in the standard auto_ptr class. For example, scoped_ptr does not allow you to copy one object to another. This in turn prevents you from transferring ownership of the underlying pointer. scoped_ptr is more restrictive than auto_ptr in this sense. auto_ptr allows copying, but when you copy, ownership of pointer transfers to the target. This can be a suble source of problems. scoped_ptr allows you to explicitly state that you don't want your code to ever transfer ownership of the pointer. shared_ptr allows you to copy pointer objects, but it does not transfer ownership. Unlike auto_ptr, the boost shared_ptr class provides true copy semantics. It accomplishes this by maintaining a reference count. shared_ptr pointer is probably the most common of the Boost smart pointers. Listing 2.3 demonstrates how to use shared_ptr.
//----------------------------------------------------------------------------- //Listing 2.3: shared_ptr.cpp #include <iostream> using namespace std; #include "boost/smart_ptr.hpp" class Foo { public: Foo() { cout << " - constructed Foo: " << this << endl; } ~Foo() { cout << " - destroyed Foo : " << this << endl; } void DoSomething() { cout << " - " << __FUNC__ << " : " << this << endl; } }; void Test() { cout << "beginning of Test scope" << endl; boost::shared_ptr <Foo> foo1 (new Foo); foo1->DoSomething(); { cout << "beginning of inner scope" << endl; boost::shared_ptr <Foo> foo2(new Foo); foo2->DoSomething(); cout << "Assigning foo2 to foo1 " << endl; foo1 = foo2; foo2->DoSomething(); // prove that foo2 still refers to something cout << "end of inner scope." << endl; } foo1->DoSomething(); cout << "end of Test scope." << endl; } int main() { cout << std::hex << std::uppercase; cout << "calling Test" << endl; Test(); cout << "Test returned" << endl; /* // can't do this boost::shared_ptr <IFoo> foo1 (new Foo); boost::shared_ptr <IFoo> foo2 (foo1.get()); */ return 0; } //----------------------------------------------------------------------------- Output of Listing 2.3 calling Test beginning of Test scope - constructed Foo: C76320 - Foo::DoSomething : C76320 beginning of inner scope - constructed Foo: C7634C - Foo::DoSomething : C7634C Assigning foo2 to foo1 - destroyed Foo : C76320 - Foo::DoSomething : C7634C end of inner scope. - Foo::DoSomething : C7634C end of Test scope. - destroyed Foo : C7634C Test returned The most interesting part of this example is the assignment of foo2 to foo1. At this point in the code, the reference count for the Foo object at 0xC76320 drops to zero and the object is destroyed. The reference count for the object at 0xC7634C increments to two. After the assignment, foo1 and foo2 maintain a pointer to the same object. When the inner scope completes, the foo2 smart pointer is destroyed. However, this does not delete the underlying pointer. In merely decrements the reference count. At this point, the reference count drops from 2 to 1. foo1 is left holding the last reference to the object at 0xC7634C. When the test function finally returns, foo1 is destroyed, the reference count drops to zero, and the last Foo object is finally deleted.
2.3.4 boost::static_assertA static assertion is an error check that occurs at compile time. If the assertion fails, the compiler generates an error. Although static assertions have been around for some time, the book Modern C++ Design (Alexandrescu[2001]) has brought them into the spotlight. The Boost static assertions library provides an easy way to perform compile time assertions. Listing 2.4 demonstrates how. //----------------------------------------------------------------------------- //Listing 2.4: static_assert.cpp #include <iostream> using namespace std; #include "boost/static_assert.hpp" // Test is a simple structure. The static assertion will fail if the // pragma's are not part of the code. //#pragma pack(1) struct Test { char c; short s; int i; }; //#pragma pack() int main() { // ensure that the structure was byte aligned BOOST_STATIC_ASSERT(sizeof(Test) == 7); // our code is not ready for ints that are bigger than 4. // force a compiler error now if larger int size is detected BOOST_STATIC_ASSERT(sizeof(int) <= 4); cout << "Hello world" << endl; return 0; } //----------------------------------------------------------------------------- Compiler output for Listing 2.4 [C++ Error] static_assert.cpp(20): E2450 Undefined structure 'boost::STATIC_ASSERTION_FAILURE<0>' [C++ Error] static_assert.cpp(20): E2109 Not an allowed type The compiler generates an error for Listing 2.4 because the Test structure does have the correct size. The only way to fix the problem is to compile the code with the -a1 switch, or surround the structure definition with pragma directives that temporarily adjust the structure packing.
2.3.5 boost::tupleA tuple is essentially a structure with anonymous or un-named members. It is a fixed size sequence of elements, usually of different types (a tuple of like elements is more or less an array). Some languages, such as Python, provide native support for tuples. C++ does not. Tuples are frequently used to create functions that return more than one value. For example, the Execute method of an ADO connection object in Python returns a tuple (assuming you are running Python on Windows). The second element of the tuple is Boolean flag that indicates whether the query succeeded. The first element contains the result set in the form of an ADO RecordSet object. # A python code fragment that demonstrates the use of a tuple result = ADOConnection.Execute("select * from orders) if result[1] != 0: while not result[0].EOF: result[0].MoveNext() The result variable is a tuple. In Python, tuples are dereferenced by index. result[1] returns the Boolean flag and result[0] gives us access to the ADO result set. Python also allows you to bind members of a tuple to variables. For example: # A python code fragment that demonstrates the use of a tuple (rs, success) = ADOConnection.Execute("select * from orders) if success != 0: while not rs.EOF: rs.MoveNext() The parenthesis form a tuple from the variables rs and success. Python allows this tuple to act as the target of a tuple assignment. C++ does not provide native support for tuples. However, Boost provides a tuple library whose syntax is very easy to use. Boost tuples resemble tuples in python. Listing 2.5 demonstrates how to utilize the Boost tuple library. //----------------------------------------------------------------------------- //Listing 2.5: tuple.cpp #include <iostream> #include <string> using namespace std; #include "boost/tuple/tuple.hpp" #include "boost/lexical_cast.hpp" boost::tuple<bool,string> GetSomeString(int index) { if(index == 2) return boost::make_tuple(false, ""); else return boost::make_tuple(true, "value: " + boost::lexical_cast<string>(index)); } int main() { boost::tuple<bool, string> t = GetSomeString(5); cout << "GetSomeString(5) returned tuple: " << endl << " - t.get<0>() = " << t.get<0>() << endl << " - t.get<1>() = " << t.get<1>() << endl << endl; t = GetSomeString(2); cout << "GetSomeString(2) returned tuple: " << endl << " - t.get<0>() = " << t.get<0>() << endl << " - t.get<1>() = " << t.get<1>() << endl << endl; bool success; string value; boost::tie(success,value) = GetSomeString(5); cout << "GetSomeString(5) returned tuple: " << endl << " - success = " << success << endl << " - value = " << value << endl << endl; boost::tie(success,value) = GetSomeString(2); cout << "GetSomeString(2) returned tuple: " << endl << " - success = " << success << endl << " - value = " << value << endl << endl; return 0; } //----------------------------------------------------------------------------- The function GetSomeString returns a tuple that contains an integer and a string. By returning a tuple, the function essentially returns two results instead of one. Listing 2.5 demonstrates two ways of working with tuples. First, you can declare a tuple object and work with it directly. To access individual elements of the tuple, call the get function. Pass the index of the element as a template argument to get. The syntax is: using namespace boost; tuple<int, string, float> t(1, "hello", 3.14); get<0>(t) = 42; get<1>(t) = "world"; float f = t.get<2>(); The second strategy for using tuples is to tie the tuple to existing variables. Listing 2.5 calls the tie utility function to bind the variables success and value. The code fragment below highlights the tie function. using namespace boost; tuple<bool,string> GetSomeString(int index); // same function as before bool b; string s; tie(b,s) = GetSomeString(2); if(b) cout << s;
3: Xerces3.1 IntroductionThe Apache Software Foundation (www.apache.org) provides a variety of open source libraries for both C++ and Java developers. One of those libraries is an XML parsing library called Xerces. Xerces is a validating XML parser. It implements the DOM 1.0, DOM 2.0, SAX 1.0, and SAX 2.0 specifications. The current version of Xerces is 1.7.0. 3.2 Installing XercesOf the libraries discussed in this article, Xerces is probably the easiest library to install. Xerces includes project files for BCB6. To build and install Xerces, follow these steps.
3.3 Xerces ExamplesThere are two predominant models for parsing XML documents: SAX and DOM. SAX is an event driven model, whereas DOM is a tree based object model. DOM Parsers read the entire XML document and return an in memory tree structure that represents the content of the document. This tree structure resembles a parse tree. After parsing a document, you can navigate the tree structure to read values from the document. SAX parsers operate differently. They parse the document element by element. Each token in the XML file generates an event. To read values from the XML document, you must hook these events. Xerces supports DOM 2.0 and SAX 2.0. Each model has strengths and weaknesses. The list below highlights some of the key differences between the two.
3.3.1 DOM ParserReading an XML document with the Xerces DOM parser consists of several steps:
Listing 3.1 shows a minimal DOM parsing example. //----------------------------------------------------------------------------- //Listing 3.1: dom-minimal/main.cpp #pragma hdrstop #include <iostream> #include <util/PlatformUtils.hpp> #include <parsers/DOMParser.hpp> #include <dom/DOM.hpp> using namespace std; int main() { // NOTE: error handling has been omitted from this example. Typically, // Initialize and parse should be surrounded by try catch blocks. XMLPlatformUtils::Initialize(); try { DOMParser parser; parser.setValidationScheme(DOMParser::Val_Auto); parser.setIncludeIgnorableWhitespace(false); parser.parse("test.xml"); DOM_Document doc = parser.getDocument(); // Add code that works with the DOM_Document. } catch(...) { cout << "An error occurred" << endl; } XMLPlatformUtils::Terminate(); return 0; } //----------------------------------------------------------------------------- Listing 3.1 demonstrates how to initialize the Xerces engine, create a DOM parsing object, and how to parse an XML file. However, it doesn't show how to interact with the DOM tree once the parser is finished. Listing 3.2 contains a more complete example. This example is a GUI project that walks the DOM tree and adds each DOM node to a TTreeView control. Once the iteration is complete, the tree control resembles the structure of the in-memory DOM tree. //----------------------------------------------------------------------------- //Listing 3.2: dom-treewalker/dommain.cpp //--------------------------------------------------------------------------- #include <vcl.h> #pragma hdrstop #include "dommain.h" #include <util/PlatformUtils.hpp> #include <parsers/DOMParser.hpp> #include <dom/DOM.hpp> #include <string> #include <sstream> using namespace std; //--------------------------------------------------------------------------- #pragma package(smart_init) #pragma resource "*.dfm" TForm1 *Form1; //--------------------------------------------------------------------------- // DOM string utilities std::string DOMStringToStdString(const DOMString& s); AnsiString DOMStringToAnsiString(const DOMString& s); ostream& operator<< (ostream& target, const DOMString& s); //DOM tree walking routines void WalkTreeItem(TTreeNodes *nodes, TTreeNode *parent, const DOM_Node &domnode); void WalkTree(TTreeView *tree, DOM_Node &node); __fastcall TForm1::TForm1(TComponent* Owner) : TForm(Owner) { try { XMLPlatformUtils::Initialize(); } catch(const XMLException& toCatch) { ShowMessage("Error during Xerces-c Initialization.\n"); Application->Terminate(); } } __fastcall TForm1::~TForm1() { XMLPlatformUtils::Terminate(); } //--------------------------------------------------------------------------- void __fastcall TForm1::BrowseButtonClick(TObject *Sender) { if(ExtractFileExt(FileNameEdit->Text).Length() !=0) { OpenDialog->InitialDir = ""; OpenDialog->FileName = FileNameEdit->Text; } else { OpenDialog->FileName = ""; OpenDialog->InitialDir = FileNameEdit->Text;; } if(OpenDialog->Execute()) FileNameEdit->Text = OpenDialog->FileName; } //--------------------------------------------------------------------------- void __fastcall TForm1::ParseButtonClick(TObject *Sender) { AnsiString xmlfile = FileNameEdit->Text; if(!FileExists(xmlfile)) { ShowMessage("File does not exists."); return ; } Memo1->Lines->LoadFromFile(xmlfile); DOMParser parser; parser.setValidationScheme(DOMParser::Val_Auto); parser.setIncludeIgnorableWhitespace(false); parser.parse(xmlfile.c_str()); // note: ignoring exceptions, should catch // XMLException and DOM_DOMException types // Walk the DOM document tree and put each node into the treeview DOM_Document doc = parser.getDocument(); WalkTree(TreeView1, doc); } //--------------------------------------------------------------------------- std::string DOMStringToStdString(const DOMString& s) { // note: this would be a good place to use boost::scoped_array char *p = s.transcode(); std::string result (p); delete [] p; return result; } AnsiString DOMStringToAnsiString(const DOMString& s) { // note: this would be a good place to use boost::scoped_array char *p = s.transcode(); AnsiString result (p); delete [] p; return result; } ostream& operator<< (ostream& target, const DOMString& s) { target << DOMStringToStdString(s); return target; } void WalkTreeItem(TTreeNodes *nodes, TTreeNode *parent, const DOM_Node &domnode) { AnsiString nodeName = DOMStringToAnsiString(domnode.getNodeName()); AnsiString nodeValue = DOMStringToAnsiString(domnode.getNodeValue()); switch(domnode.getNodeType()) { case DOM_Node::TEXT_NODE: case DOM_Node::ATTRIBUTE_NODE: { if(!nodeValue.IsEmpty()) { TTreeNode * newnode = nodes->AddChild(parent, nodeName + " : " + nodeValue); DOM_Node child = domnode.getFirstChild(); while( child != 0) { WalkTreeItem(nodes, newnode, child); child = child.getNextSibling(); } } break; } case DOM_Node::ELEMENT_NODE : { TTreeNode * newnode = nodes->AddChild(parent, nodeName + " : " + nodeValue); DOM_NamedNodeMap attrs = domnode.getAttributes(); for (size_t i = 0; i < attrs.getLength(); ++i) { WalkTreeItem(nodes, newnode, attrs.item(i)); } DOM_Node child = domnode.getFirstChild(); while( child != 0) { WalkTreeItem(nodes, newnode, child); child = child.getNextSibling(); } break; } case DOM_Node::DOCUMENT_NODE : case DOM_Node::XML_DECL_NODE: case DOM_Node::COMMENT_NODE: { TTreeNode * newnode = nodes->AddChild(parent, nodeName + " : " + nodeValue); DOM_Node child = domnode.getFirstChild(); while( child != 0) { WalkTreeItem(nodes, newnode, child); child = child.getNextSibling(); } break; } // Ignore any other type of nodes case DOM_Node::DOCUMENT_TYPE_NODE: case DOM_Node::PROCESSING_INSTRUCTION_NODE : case DOM_Node::ENTITY_REFERENCE_NODE: case DOM_Node::CDATA_SECTION_NODE: case DOM_Node::ENTITY_NODE: break; default: throw EInvalidOperation("Unrecognized node type"); } } void WalkTree(TTreeView *tree, DOM_Node &node) { tree ->Items->BeginUpdate(); tree ->Items->Clear(); WalkTreeItem(tree->Items, 0, node); tree ->FullExpand(); tree ->Items->EndUpdate(); } //----------------------------------------------------------------------------- This example contains the same initialization and parsing code that Listing 3.1 had. It also contains a function called WalkTree that populates a TTreeView control from the contents of a DOM_Node object. WalkTree calls another function called WalkTreeItem. This function actually performs the iteration by recursively calling itself for DOM nodes that have children. There are a couple of points worth mentioning about the code in Listing 3.2. First, note that the Xerces library relies heavily on its own string class, DOMString. Xerces does not provide a mechanism for converting DOMString objects to std::string or AnsiString. You have to provide this conversion yourself in the form of helper routines. Listing 3.2 contains two helper functions for converting DOMString objects: DOMStringToStdString and DOMStringToAnsiString. Secondly, notice the large switch block in WalkTreeItem. The DOM model represents nodes from the XML file as DOM_Node objects. There are many different types of DOM nodes. The root document object is a DOM node. Elements and attributes are also DOM nodes. The different node types all inherit from the DOM_Node base class. The getNodeType method of DOM_Node returns and enum value that indicates what type of node the object really is. The switch block reads the enum value to determine how the node should be added to the tree. 3.3.2 SAX Parser
Reading an XML document with the Xerces SAX parser is similar to using the DOM parser. The key difference is that you need to implement a ContentHandler and pass an instance of your handler to the parser. Here are the steps:
Listing 3.3 contains a simple SAX parser. //------------------------------------------------------------------------------ //Listing 3.3: sax-minimal/main.cpp #pragma hdrstop #include <iostream> #include <memory> using namespace std; #include <util/PlatformUtils.hpp> #include <sax2/SAX2XMLReader.hpp> #include <sax2/XMLReaderFactory.hpp> #include <sax2/DefaultHandler.hpp> class SAX2Handler : public DefaultHandler // DefaultHandler inherits from { // ContentHandler and ErrorHandler public: SAX2Handler() {} ~SAX2Handler() {} // Overrides for ContentHandler interface virtual void startElement(const XMLCh* const uri, const XMLCh* const localname, const XMLCh* const qname, const Attributes& attrs); virtual void endElement(const XMLCh* const uri, const XMLCh* const localname, const XMLCh* const qname); virtual void characters(const XMLCh* const chars, const unsigned int length); virtual void startDocument(); virtual void endDocument(); // Overrides for ErrorHandler interface void warning(const SAXParseException& exception); void error(const SAXParseException& exception); void fatalError(const SAXParseException& exception); }; void SAX2Handler::startElement(const XMLCh* const uri, const XMLCh* const localname, const XMLCh* const qname, const Attributes& attrs) { wcout << "startElement: "<< uri << ',' << localname << ',' << qname << endl; } void SAX2Handler::endElement(const XMLCh* const uri, const XMLCh* const localname, const XMLCh* const qname) { wcout << "endElement: "<< uri << ',' << localname << ',' << qname << endl; } void SAX2Handler::characters(const XMLCh* const chars, const unsigned int length) { wcout << "characters: "<< chars << ',' << length << endl; } void SAX2Handler::startDocument() { cout << __FUNC__ << endl; } void SAX2Handler::endDocument() { cout << __FUNC__ << endl; } void SAX2Handler::warning(const SAXParseException& exception) { wcout << __FUNC__ << " : " << exception.getLineNumber() << ',' << exception.getColumnNumber() << " : " << exception.getMessage() << endl; } void SAX2Handler::error(const SAXParseException& exception) { wcout << __FUNC__ << " : " << exception.getLineNumber() << ',' << exception.getColumnNumber() << " : " << exception.getMessage() << endl; } void SAX2Handler::fatalError(const SAXParseException& exception) { wcout << __FUNC__ << " : " << exception.getLineNumber() << ',' << exception.getColumnNumber() << " : " << exception.getMessage() << endl; } int main() { try { XMLPlatformUtils::Initialize(); auto_ptr<SAX2XMLReader> parser (XMLReaderFactory::createXMLReader()); // Create a handler object and install it as the content handler and // as the error handler. SAX2Handler handler; parser->setContentHandler(&handler); parser->setErrorHandler(&handler); parser->parse("test.xml"); } catch (const XMLException& e) { wcout << "An error occurred : " << e.getType() << endl << e.getMessage() << endl; } // And call the termination method XMLPlatformUtils::Terminate(); } //------------------------------------------------------------------------------
3.4 Links to Xerces resourcesXerces links
XML links
Books
4: ACE and TAO4.1 IntroductionACE is a C++ library for building distributed network systems. It includes classes for communicating through a socket connection, creating threads, working with memory mapped files, and a variety of other tasks. One of the key benefits of ACE is the fact that it is cross platform. ACE is written in C and C++, and has been ported to a variety of platforms, including Windows, Linux, Unix, and embedded operating systems such as VxWorks. To achieve this portability, ACE utilizes a layered approach. ACE employs a facade wrapper that encapsulates OS specific calls. As long as you stick to the facade wrapper, your code should port easily to other platforms. TAO is a CORBA orb that is built on top of ACE. Like ACE, it is open source, free, and cross platform. TAO is a very attractive choice when compared with expensive orbs such as Iona's Orbix and Borland's own Visibroker. 4.2 Installing ACE and TAOInstructions for building ACE and TAO with C++Builder can be found at the following two links:
Both ACE and TAO provide makefiles for building the libraries with Borland C++Builder. These makefiles work fine with BCB6. However, the instructions do not discuss how to configure the BCB6 IDE for ACE and TAO. To configure BCB6, follow the instructions below. Note that these instructions duplicate much of the information from the two links above.
At this point, ACE and TAO will be installed on your system. To use ACE or TAO in a BCB project, modify your include and library paths so they point to the ACE directories. Add $(ACE_TAO)\include to your include path and add $(ACE_TAO)\lib to your library path. You will also need to add the correct LIB files to your project (ACE_bp.lib and TAO_bp.lib at a minimum). 4.3 ACE and TAO examples4.3.1 A simple ACE socket clientACE provides a wide variety of services that you can utilize from your C++ projects. One of the core ACE services is a set of platform independent, socket wrapper classes. The ACE socket wrappers act as a facade that encapsulates OS specific socket calls. The ACE socket facade shields your application from inconsistencies between the Win32 socket API and the Unix socket API. Listing 4.1 contains a simple socket client built using the ACE framework. //------------------------------------------------------------------------------ //Listing 4.1: ace-client/main.cpp #pragma hdrstop #include <cstring> #include <iostream> using namespace std; #include "ace/SOCK_connector.h" #include "ace/SOCK_stream.h" #include "ace/INET_addr.h" #pragma argsused int main(int argc, char* argv[]) { ACE_SOCK_Connector connector; ACE_SOCK_Stream stream; ACE_INET_Addr address; // connect to the web server if(address.set(80,"www.bcbdev.com") == -1) return 1; if(connector.connect(stream, address) == -1) return 1; // perform an http get const char* message = "GET /index.html HTTP/1.0\r\n\r\n"; stream.send_n(message, strlen(message)); ssize_t count=0; const size_t BUFSIZE=4096; char buff[BUFSIZE]; while( (count=stream.recv(buff, BUFSIZE)) > 0) { cout.write(buff, count); } return stream.close(); } //------------------------------------------------------------------------------ There are three ACE classes involved in this example: ACE_SOCK_Connector, ACE_SOCK_Stream, and ACE_INET_Addr. The ACE_SOCK_Connector class establishes a connection with a remove server. ACE_INET_Addr abstracts the concept of an internet address and port. Many of the other ACE classes rely on ACE_INIT_Addr. Lastly, the ACE_SOCK_Stream class represents that actual data that is exchanged between the server and the client. You can send or receive data with an instance of ACE_SOCK_Stream, provided that you have first established a connection.
4.3.2 An ACE threading exampleACE also provides classes that help you create multithreaded applications. In fact, it offers a variety of classes, ranging from simple threading to advanced thread pool management. Listing 4.2 contains a simple example of the ACE threading capabilities. //------------------------------------------------------------------------------ //Listing 4.2: ace-thread/main.cpp #pragma hdrstop #include <cstdlib> #include <iostream> using namespace std; #include "ace/Thread_Manager.h" #include "ace/Synch.h" int seed_value=0; #pragma argsused void * thread_func(void *arg) { ACE_DEBUG((LM_DEBUG,"Thread %t started.\n")); ACE_OS::srand(seed_value++); int loop_count = rand()%10; int delay; for(int j=1; j<=loop_count; ++j) { delay = ACE_OS::rand() % 4; ACE_DEBUG((LM_DEBUG,"Thread %t sleeping for %d seconds : %T\n",delay)); ACE_OS::sleep(delay); delay = ACE_OS::rand() % 4; ACE_DEBUG((LM_DEBUG,"Thread %t awake for %d seconds : %T\n",delay)); ACE_Time_Value timeout(ACE_OS::gettimeofday()); timeout += delay; while(ACE_OS::gettimeofday() < timeout) ; } ACE_DEBUG((LM_DEBUG," - Thread %t shutting down.\n")); return 0; } #pragma argsused int main(int argc, char* argv[]) { const int thread_count=2; ACE_Thread_Manager::instance()->spawn_n(thread_count, (ACE_THR_FUNC)thread_func); ACE_Thread_Manager::instance()->wait(); ACE_DEBUG((LM_DEBUG,"All threads have finished.\nShutting down.\n")); return 0; } //------------------------------------------------------------------------------ This example spawns 2 separate threads using the ACE_Thread_Manager class. The function thread_func acts as the main thread routine. Notice how the example code uses the OS wrapper facades in ACE_OS. Instead of calling an OS specific routine to make the thread sleep, the code calls the wrapper facade. This ensures portability because ACE_OS::sleep encapsulates and hides the OS specific routine. 4.3.3 A simple TAO CORBA server and clientTAO includes a number of CORBA example programs ($(ACE_TAO)\ACE_wrappers\TAO\examples). You can compile the examples from the command line using make. However, it is also beneficial to see how to setup a TAO project in the IDE. The archive for this article includes the TAO time server example, complete with BCB6 project files. To create TAO BCB projects from scratch, follow these instructions.
Note that configuring a server IDE project is largely the same as those for creating a client. Also notice how vital environment variables are. By using environment variables, you can create an IDE project that is flexible with respect to where ACE and TAO reside. Without them, you would have to rely on hardcoded directory names.
4.4 ACE and TAO resourcesIn this article, we discussed how to install and configure ACE and TAO to work with C++Builder 6.0. However, we have not even begun to scratch the surface of what ACE and TAO are capable of. If you are interested in ACE or TAO, check out the following links and books.
Books
5: Flex and Bison (Lex and Yacc replacements)5.1 IntroductionFlex and Bison are Lex and Yacc compatible replacements (they are not totally compatible, but close). Flex is a lexical analyzer and Bison is a parser. Both are open source tools that are distributed by the Free Software Foundation as part of the GNU project (gnu.org). Bison falls under GPL licensing, whereas Flex is licensed under a less restrictive BSD license (software from gnu.org usually falls under the GPL, but some programs, such as Flex, use a different license). Both Flex and Bison generate C and C++ source code. They take a configuration file and create C and C++ source files that you add to your project. The resulting C and C++ source code can be compiled and linked in your BCB projects.
5.2 InstallationThere are many ways to obtain Flex and Bison. You can download from the GNU website (see http://www.gnu.org/software/flex/ and http://www.gnu.org/software/bison/). However, if you download from the GNU website, you will have to build Flex and Bison from the source code. While this isn't that big of deal, building Bison typically requires that you have a Unix like environment, such as Cygwin. There are easier ways to obtain Flex and Bison such that you don't have to compile them from scratch. One easy is to download the pre-compiled Cygwin version Flex and Bison. Section 5.2.1 and 5.2.2 describe how to install Cygwin. However, to save time, the Flex and Bison binary executables are included with this article (available on the conference CD and at http://www.bcbdev.com/ftp/source/flex-bison.zip). Both executables were compiled with the Cygwin GNU C compiler. Bison was compiled without any changes, but the Flex program was recompiled with a Borland compatible version of flex.skl. See section 5.2.2 below for more details. In order to compile the examples from this article with the least amount of effort, you should install the supplied versions of Flex and Bison. To install them:
After copying the Flex and Bison binaries, the next step is configuring them as build tools in the BCB IDE. Section 5.2.3 describes this process. If you have installed the supplied binary versions of Flex and Bison, you can skip to Section 5.2.3. 5.2.1 Installing Flex and Bison with CygwinCygwin is a set of tools that combine to form a Unix like environment on a Windows PC. The Cygwin project is maintained by Red Hat. Cygwin includes sed, awk, grep, GNU C++, gzip, tar, less, vi, and a bash shell. It also includes Flex and Bison. Although Unix software has a notorious reputation for being difficult to install, Cygwin provides a setup program that makes it almost painless. To install Cygwin:
When you launch the Cygwin Bash shell, all of the Cygwin tools reside on the system path in that shell instance. However, Cygwin does not add its tools to the Windows path. We can use Flex and Bison from the BCB IDE without having them on the system path. However, if you want to perform command line builds with make.exe, you may want to create a batch file that adds the Cygwin tools to the Windows path. Another option would be simply copy flex.exe and bison.exe to an existing directory that is already on the path, such as $(BCB)\bin. If you do this, make sure you also copy cygwin1.dll and cygintl-1.dllto the system path.
5.2.2 Patching Flex
Regardless of how you obtain Flex and Bison, you will eventually discover a small problem. Flex generates source code that is not compatible with modern C++ compilers. The main issue involves the "-+" command line option for Flex, which tells Flex to generate a lexer that works with C++ iostreams. The current version of Flex (2.5.4), creates a C++ source file that contains a forward declaration for istream. This causes problems because istream is not a class anymore, it is a typedef for basic_istream<char>. This isn't the only problem with Flex. Flex generates code that relies on a Unix/Linux header file called unistd.h. C++Builder does not provide this header file. The closest replacement is io.h. Flex also creates prototypes for the isatty function. Unfortunately, these prototypes clash with existing prototypes for the same function. In order to utilize Flex effectively, a few small changes need to be made. Fortunately, Flex is an open source library, so we can alter the way it generates source code. Not only is Flex open source, it is also well designed. We don't have to hunt through the Flex source code in order to make these changes. We simply change a configuration file that governs how Flex creates source code. This file is called flex.skl, and is part of the Flex source distribution. flex.skl governs how Flex generates code. To remove the include for unistd.h and forward declaration for istream, we simply modify flex.skl and rebuild the executable. The ZIP file for this article contains a patched version of flex.skl. The file README.TXT describes the changes that were made to the file. To patch Flex, extract flex.skl to your local directory that contains the source code for Flex. You should overwrite an existing file of the same name (you may want to create a backup of the original first). After copying the file, rebuild Flex. The bullet list below describes how to rebuild Flex using Cygwin.
5.2.3 Configuring Flex and Bison as Build Tools in BCB6Now that you have Flex and Bison installed on your system, it is time to put the tools to use. Recall that Flex and Bison process an input file and create C and C++ source files as their output. You can utilize the source output in several ways:
The first option is adequate if you have an existing lexer or parser configuration that is unlikely to change very often. However, creating a parser is generally a trial and error process. If you have to change the config files often, then option 1 is going to grow old really fast. The second option is an excellent choice if you are familiar with make files. The downfall to using make files is that it can be difficult to debug your projects. This section describes how to utilize option 3. C++Builder 6 includes new support for configuring external build tools, such as Flex and Bison. After configuring the tools, you can add Flex and Bison config files directly to your IDE based projects. The steps below describe how to configure Flex and Bison as build tools: Configuring Flex
![]() Figure 1. Configuring Flex as a Build Tool
![]() Figure 2. Bison Environment Variables
BCB provides a variety of macros that you can enter in the Command Line box. The $SAVE macro tells the IDE to save the file before invoking the external tool. Almost every build tool should list this macro. $NAME contains the filename and extension of the input file, but does not include the path (ie $NAME = parser.flex). $PATH provides the path of the input file, complete with a trailing slash. $TARGETNAME contains the path and filename of the target file. Flex and Bison provide a variety of command line options. The -o option determines the output file name for both tools. The -+ Flex option tells Flex to generate a C++ parser that works with C++ iostreams. The -d Bison option forces Bison to generate a header file. This header file will typically be included by the lexer. YOu can pass --help as a command line argument to both tools to see what options are available.
Once Flex and Bison are configured as build tools, you should be able to add Flex and Bison files to your BCB projects. If a project contains a Flex input file, the IDE will invoke flex.exe when you build the project. The source file that Flex creates can be used in two different ways. Your first option is to add the output CPP file to your project just like any other source file. If you follow this strategy, both the Flex input file and the Flex output file will be part of the project. The second option is to #include the Flex generate CPP file from some other CPP that is already in your project. Including a CPP is usually not a good idea, but in this case, it makes sense. The examples that accompany this section demonstrate both techniques. 5.3 Flex and Bison Examples5.3.1 A C++ Comment Stripper, Built with FlexYou can solve many string processing problems with Flex alone. Bison generates code that parses and validates a syntax grammar. However, if you just need to convert an input text file to a different output format, then you probably won't need a full set of grammar rules. In this case, Flex alone may be sufficient. Listing 5.1 shows a Flex configuration file for implementing a C++ comment stripper. This lexer processes C++ source code and strips out any comments that it finds. Listing 5.2 contains the C++ code that interfaces to the lexer. %{ // Listing 5.1: flex/comment-stripper/lexer.flex // Flex input file for building a program that strips C and C++ // comments from a source file. #include <istream> #include <ostream> using namespace std; void parse(istream &in, ostream &out); %} %option noyywrap BEGIN_BLOCK_COMMENT "/*" END_BLOCK_COMMENT "*/" %x BlockComment %x SingleLineComment %% <INITIAL>"//" { BEGIN(SingleLineComment); } <INITIAL>{BEGIN_BLOCK_COMMENT} { BEGIN(BlockComment); } <INITIAL>.|\n ECHO; <SingleLineComment>\n { BEGIN(INITIAL); } <SingleLineComment>. ; <BlockComment>{END_BLOCK_COMMENT} { BEGIN(INITIAL); } <BlockComment>.|\n ; %% void parse(istream &in, ostream &out) { yyFlexLexer lexer(&in, &out); lexer.yylex(); } //------------------------------------------------------------------------------ //Listing 5.2: flex/main.cpp // Flex based C++ comment stripper // The comment you are reading should not survive the stripper. /* This comment shouldn't * survive either */ #include <fstream> #include <iostream> using namespace std; // prototype for the parser function void parse(istream &in, ostream &out); #pragma argsused int main(int argc, char* argv[]) { cout << "About to invoke the parser." << endl << "Note that the lexer is /*probably not*/ perfect!" << endl << "--------------------------------------------------" << endl; ifstream sourcefile ("main.cpp"); parse(sourcefile, cout); return 0; } //------------------------------------------------------------------------------ Flex files contain three distinct sections, which are delimited by two percent signs (%%). The first section is a definition section. It contains regular expression definitions, state definitions, lexer specific directives, and raw C++ code (typically include directives and function prototypes). The middle section is the rules section. It contains lexing rules that tell flex how to process text. The third and last section is called the user subroutines section. It holds any additional C++ code that is needed for the lexer. The definition section in Listing 5.1 starts by including the C++ stream header files. It also contains a prototype for a function called parse. Notice that the lexer file delimits raw C++ code with %{ and %}. %{ // Listing 5.1: flex/comment-stripper/lexer.flex // Flex input file for building a program that strips C and C++ // comments from a source file. #include <istream> #include <ostream> using namespace std; void parse(istream &in, ostream &out); %} The rest of the definition section looks like this: %option noyywrap BEGIN_BLOCK_COMMENT "/*" END_BLOCK_COMMENT "*/" %x BlockComment %x SingleLineComment noyywrap is a directive that tells Flex not to look for more input after reaching the end of the input stream. BEGIN_BLOCK_COMMENT and END_BLOCK_COMMENT are regular expression definitions. The rules section relies on these definitions. The lines that begin with %x are state definitions. The lexer utilizes states to determine whether it is in the middle of a comment or real code. The middle section of the flex file contains lexing rules. The lexing rules look like this: <INITIAL>"//" { BEGIN(SingleLineComment); } <INITIAL>{BEGIN_BLOCK_COMMENT} { BEGIN(BlockComment); } <INITIAL>.|\n ECHO; <SingleLineComment>\n { BEGIN(INITIAL); } <SingleLineComment>. ; <BlockComment>{END_BLOCK_COMMENT} { BEGIN(INITIAL); } <BlockComment>.|\n ; The lexer in this example utilizes a state machine consisting of three states: BlockComment, SingleComment, and INITIAL. INITIAL is the starting state for the lexer. This state is implied and need not be listed. Each rule begins with a state name, listed inside < and > brackets, followed by a regular expression. The remaining text after the regex is C++ code. If the lexer encounters the expression while in a given state, it executes that C++ code. For example, the first lexing rule in this example was:<INITIAL>"//" { BEGIN(SingleLineComment); } This rule states the following: if the lexer is in the INITIAL state and it encounters two slash characters, then switch to the SingleLineComment state. Note that BEGIN is a special directive that Flex understands. SingleLineComment was defined as a state using %x in the definition section. The second lexing rule looks a little different. <INITIAL>{BEGIN_BLOCK_COMMENT} { BEGIN(BlockComment); } This rule says that if the lexer is in the initial state and it encounters text that matches the regular expression BEGIN_BLOCK_COMMENT, then transition to the BlockCommentState. The curly braces around BEGIN_BLOCK_COMMENT tell Flex that BEGIN_BLOCK_COMMENT is a regular expression that was defined in the definitions section. Without the curly braces, Flex would try to literally match the text BEGIN_BLOCK_COMMENT. Once the lexer encounters the beginning of a comment, it executes a state transition to either the SingleLineComment state or the BlockComment state. These two states each contain their own lexing rules. The rules for SingleLineComment look like this: <SingleLineComment>\n { BEGIN(INITIAL); } <SingleLineComment>. ; The first rule states that if a carriage return is encountered, then return to the INITIAL state. The second rule tells flex to simply swallow all other characters. The . tells Flex to match any input character, and the semicolon all by itself tells flex to simply do nothing with those characters (they won't be copied to the output). The last section of the Flex file contains any miscellaneous C++ that is needed by the lexer. In this example, the subroutines section defines the parse function. This routine takes two stream arguments. From those streams, parse constructs a C++ lexer object. The parse function activates the lexer by calling the yylex member function. The lexer from example 5.1 works, but it is not quite perfect. What happens if the lexer encounters C++ comment tokens from inside a character literal? For example, how would the lexer handle the following input? cout << "The lexer is /*probably not*/ perfect!" << endl; Based on the rules in the Flex file, the lexer would strip out the text /*probably not*/. To be truly useful, the lexing rules need to change to take this into account. 5.3.2 Evaluating a numerical expressionBison is a code generation tool that helps you write a parser that enforces a grammar. A Bison file contains rules that define the structure of the grammar. Bison generates code that accepts tokens from a lexer, and then matches those tokens to the grammar. In this section, we will discuss a program that uses Flex and Bison to parse a string that contains a simple numerical expression, such as 3 + 6 * 2 - (2 - 29). Flex handles the job of separating the input text into different tokens. Bison generates code that analyzes those tokens to make sure that they follow the grammar rules for a numerical expression. It also evaluates the expression. Listing 5.3 contains a Bison input file called parser.bison. Listing 5.4 contains a Flex lexer file that feeds tokens to the parse. Listing 5.5 contains a small C++ source file that tests the parser.
%{ //-------------------------------------------------------------------------- // Listing 5.3: flex/calculator/parser.bison // Bison input file for building a program that evaluates a // numerical expression. #include <cstdio> #include <iostream> using namespace std; inline void yyerror(const char *c) { cout << "Error!: "<< c << endl; return; } extern char *yytext; int yylex(); %} %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS %% result: expression { yylval = $1; } ; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')' { $$ = $2; } | NUMBER { $$ = $1; } ; %% %{ //-------------------------------------------------------------------------- %} %{ //-------------------------------------------------------------------------- // Listing 5.4: flex/calculator/lexer.flex // Bison input file for building a program that evaluates a // numerical expression. #include <iostream> #include <cstdlib> using namespace std; #include "parser.hpp" void yy_input(char *, int &count, int max); #define YY_INPUT(buffer, count, max) yy_input(buffer,count,max) %} %option noyywrap delim [ \t] ws {delim}+ letter [A-Za-z] digit [0-9] integer {digit}+ float {digit}+\.{digit}+ %% {integer} { // when we match the regular expression for an integer, // convert the string to an integer and return it to // the parser by assigning it to yylval. yylval = atoi(yytext); return NUMBER; } {ws} { // eat whitespace ; } [\n] { // when the end of line is encountered // return 0 to signal end of input return 0; } . { // Return any other character to the parser // as a token. This rule handles '+' and '-'. It // also handles any invalid character return yytext[0]; } %% // iter and end form an iterator range of [iter,end). This is the // range of characters to process. To parse an expression, these // two iterators should be set. const char * iter = 0; const char * end = 0; void yy_input(char *buf, int &count, int max) { if(max > 0) { const char * end_iter = min(iter+max, end); count = end_iter-iter; if(count) { copy(iter, end_iter, buf); iter+=count; } } } //-------------------------------------------------------------------------- //------------------------------------------------------------------------------ //Listing 5.5: flex/main.cpp #include <fstream> #include <iostream> #include <algorithm> #include <cstring> #pragma hdrstop using namespace std; // Just include the cpp files for compilation. The BCB6 build tools don't // support the ability to compile the output of flex or bison #include "parser.cpp" #include "lexer.cpp" int main() { const char * input = "3 + 6 * 2 - (2 - 29) \n"; cout << "Input is : " << input << endl << endl; iter = input; end = iter + strlen(input); yyparse(); cout << "result is: " << yylval << endl; return 0; } //------------------------------------------------------------------------------ Like Flex files, Bison files contain three sections: the definition section, the parsing rules section, and a section for writing C++ routines. In Listing 5.3, the key piece of the definition section looks like this: %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS The %token line defines the tokens that the lexer can pass to the parser. Note that single character tokens, such as '+' and '-', don't have to be listed with %token. The %left directive establishes operator precedence. The tokens '+' and '-' have a lower precedence than '*' and '/'. Unary negation (UMINUS) has the highest precedence. The grammar rules for the parser reside in the middle section of the Bison file. The rules for the calculator look like this: result: expression { yylval = $1; } ; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')' { $$ = $2; } | NUMBER { $$ = $1; } ; The first rule states that a result consists of one expression. When a complete expression is encountered, the parser assigns the result to yylval, a variable that is provided by the parser. The second rule defines what an expression is. Notice that an expression is a combination of one or more sub-expressions. Each parsing rule consists of a symbol, such as expression '+' expression, followed by a value, such as { $$ = $1 + $3; }. The value consists of C++ code. The tokens that start with '$' are special macros for Bison. Bison replaces $1 with the first token from the symbol. '$2' is the second token, and '$3' is the third token, and so on. In this example, we don't have any rules with more than three tokens. The '$$' macro represents the resulting value of the token. Listing 5.4 contains the Flex lexer that feeds the parser. The lexing rules are relatively simple. {integer} { // when we match the regular expression for an integer, // convert the string to an integer and return it to // the parser by assigning it to yylval. yylval = atoi(yytext); return NUMBER; } {ws} { // eat whitespace ; } [\n] { // when the end of line is encountered // return 0 to signal end of input return 0; } . { // Return any other character to the parser // as a token. This rule handles '+' and '-'. It // also handles any invalid character return yytext[0]; } The lexer recognizes tokens and passed them to the parser. The first lexing rule identifies integer values. When a string is found that matches the integer regular expression, the lexer converts the string to an integer and stuffs the value into yylval. It then returns the NUMBER token to the parser. The lexer also looks for white space. If it finds a white space character, it simply throws it out because the parser is not interested in white space tokens. If a newline character is found, the lexer returns 0 to signal the end of the current input expression. Any character that does not match any of the other rules is returned directly to the parser as a token. In our C++ comment stripper, we added the Flex generated C++ file to our C++Builder project. The calculator program takes a different approach. Instead of adding the generated C++ files to the project, we simply #include them from the main C++ source file (Listing 5.5). // Just include the cpp files for compilation. The BCB6 build tools don't // support the ability to compile the output of flex or bison #include "parser.cpp" #include "lexer.cpp" This might seem kind of kludgy, but it works, and it is actually easier to use and less quirky in practice than trying to add the generated C++ files to the project. 5.4 Links to Flex and Bison resourcesLinks
Books
6: wxWindows6.1 IntroductionwxWindows is a cross platform, C++ library for building GUI applications. It works on a variety of compilers and platforms, including Windows, Linux, and Mac. wxWindows is similar to OWL and MFC in its structure. It is a pure C++ library. The class hierarchy includes classes for creating windows, buttons, list boxes, and so on. There is also a class that represents the application as whole. Like OWL and MFC, wxWindows does not provide any form of RAD development. 6.2 InstallationwxWindows includes makefiles for building the library and the sample projects with Borland C++Builder. To utilize wxWindows, we need to build the libraries and then configure the IDE include and library paths to point to the wxWindows files. To install wxWindows, follow the steps below:
That is all you need to do to build the wxWindows libraries. If you want, you can also build the sample programs from the command line. The sample projects are in $(WXWIN)\samples. Each example resides in a separate subdirectory. To build an example, navigate to one of these subdirectories in a console window and type make -fmakefile.b32. 6.3 Creating wxWindows projects in the IDEIf you want to do any serious wxWindows work, you will probably want to maintain your projects in the IDE. This makes debugging much easier. To create a new wxWindows app from scratch, follow these steps:
The ZIP file for this article contains a hello world wxWindows project that was constructed from these same set of instructions. If you have any difficulties creating a wxWindows project, check your settings against those in the supplied project. 6.4 wxWizard: an IDE wizard for building wxWindows projectsCreating a wxWindows IDE project is tedious and error prone. To simplify matters, I have created an IDE plug-in wizard for creating wxWindows projects. The wizard is included with the ZIP file for this article (see the wxWindows\wxWizard directory). Consult the readme.txt file that accompanies the wizard for instructions on how to install and use the wizard. 7: PCRE7.1 IntroductionPCRE is an open source C library for creating Perl Compatible Regular Expressions. It is a common library that has gained widespread acceptance and has been utilized in many popular projects such as Python, Apache, and PHP. The PCRE home page is http://www.pcre.org. The current version is 3.9. C++Builder comes with version 2.01 of PCRE preinstalled. You don't have to do anything special to install the library. The only downside to using the preinstalled version is that BCB does not supply the most current version of PCRE. In most cases, the older version is adequate. If you are interested in upgrading your version of PCRE, send me an email (hhowe@bcbdev.com). 7.2 PCRE examples7.2.1 Raw PCRE callsThe PCRE library consists of roughly ten C style functions, which are declared in the header file pcre.h (available in $(BCB)\include). Simple pattern matching requires only three of them: pcre_compile, pcre_exec, and pcre_free. The pcre_compile routine compiles a regular expression and returns a handle to the compiled expression object. pcre_exec executes a regular expression search. You pass it the compiled expression object, the string to search, and a handful of additional parameters. pcre_free is a function pointer that typically points to the free routine in the RTL. Listing 7.1 demonstrates how to match a pattern using the PCRE library. //------------------------------------------------------------------------------ //Listing 7.1: pcre/pcre-test/main.cpp #include <cstring> #include <algorithm> #include <iostream> #include <pcre.h> using namespace std; // resolve pcre_free to workaround bug in BCB6 RTL void (*pcre_free)(void *) = free; bool Test(const char *str) { // policy numbers start with MP or CM followed by 1 to 8 digits const char *policy_pattern = "^(MP|CM)(\\d{1,8})$"; const char *errbuf = 0; int erroffset = 0; int offsets[45]; int size_offsets = sizeof(offsets)/sizeof(int); pcre *regex = 0; regex = pcre_compile(policy_pattern, 0 , &errbuf, &erroffset, 0); // Note: // In the newest version of PCRE, pcre_exec takes an additional int // parameter. This argument is commented out in the call below. int result = pcre_exec(regex, 0, str, strlen(str), /*0,*/ 0 , offsets, size_offsets); cout << "regex = " << policy_pattern << endl << "str = " << str << endl << "result = " << result << endl; if(result > 0) cout << "regex matched" << endl; else if (result == -1) cout << "regex did not match" << endl; else cout << "a travesty occurred, maybe we should look at errbuf?" << endl; char buf[256]; for( int j=0; j<result; ++j) { memset(buf, 0, 256); int start = offsets[j*2] ; int end = offsets[j*2 + 1] ; std::copy(&str[start], &str[end],buf); // could also use strncpy cout << " offset[" << j*2 << "]="<< start << " : str[" << start << "]="<< str[start] << endl; cout << " offset[" << j*2+1 << "]="<< end << " : str[" << end << "]="<< str[end] << endl; cout << " subpattern[" << j << "] = " << buf << endl<< endl; } cout << endl; pcre_free(regex); return result > 0; } int main() { Test("MP001234"); // match Test("CM77123"); // match Test("xMP12"); // no match, leading x (note use of PCRE_ANCHORED) Test("MP12x"); // no match, extra stuff at end) Test("MP123456789"); // no match, too many letters (extra stuff at end) Test("foobar"); return 0; } //------------------------------------------------------------------------------ The meat of Listing 7.1 resides a routine called Test. The Test function starts by declaring some variables that are needed by PCRE. These include the regular expression string, the regex object, variables for handling errors, and a buffer for storing the offsets of groups that are found during the search. Next, the function compiles the regular expression by calling pcre_compile. The code stores the result of pcre_compile in the regex variable. This variable acts as a handle to the compiled expression. The test function passes this handled to pcre_exec. pcre_exec performs the search. It returns -1 if the search string did not match the regular expression. If the string does match, pcre_exec returns the number of groups that matched the expression, which is 3 in this case. The regular expression was ^(MP|CM)(\\d{1,8})$. This regex contains 3 groups: one for the entire string, one for the MP|CM part, and one for the trailing (\\d{1,8}). Note that each pair of parenthesis delimits a group. When a match is found, pcre_exec fills the offsets array with index values. These values tell you where the match groups reside in the search string. The offset array is a little tricky to deal with. The offsets array contains two entries for each group in the regular expression. One entry contains the index into the string where the group starts. The second entry contains the index for one past the end of the group. Because our regular expression has 3 groups, the offsets array will contain 6 values of interest when a match is found. For the test string MP001234, the offsets are: // str = "MP001234" offset[0]=0 : str[0]=M offset[1]=8 : str[8]= offset[2]=0 : str[0]=M offset[3]=2 : str[2]=0 offset[4]=2 : str[2]=0 offset[5]=8 : str[8]= Notice how offset[0] and offset[1] form a range that spans the entire match string (the range works like iterator ranges in the STL). This range equates to the first match group, which is the entire string. offset[2] and offset[3] form a range that spans the characters 'M' and 'P'. This range corresponds to the subgroup (MP|CM) in the regular expression. The last sub range is formed by offset[4] and offset[5], which contains the indices for the characters that matched the subgroup (\\d{1,8}). After performing the regular expression search, the test function frees the compiled expression object by calling pcre_free. A memory leak occurs if the regex object is not freed.
7.2.2 PCRE wrapper classThe PCRE library is powerful, but its interface is also cumbersome. To simplify searching, I have created a wrapper class that encapsulates the PCRE API calls. Actually, there are two classes. The first class represents a compiled regular expression, and the second represents the results of a search. Listing 7.2 shows the declarations for these wrapper classes. The complete source is available in the archive (pcreobj.cpp). Listing 7.3 shows how to use the wrappers. It performs the same pattern matching from Listing 7.1 //------------------------------------------------------------------------------ //Listing 7.2: pcre/pcre-wrapper/pcreobj.h #ifndef PCREOBJ_H #define PCREOBJ_H #include <string> #include <vector> #include <pcre.h> namespace re { #define MAX_OFFSETS 255 class TRegExObj; class TRegExMatchObj; // Note: // re::compile, search, and match don't really do anything that you couldn't // do with the classes themselves. They exist to give users a clean way to // perform a search in one line of code. They also make the C++ syntax for the // re module similar to the re module in python. If you don't like these // functions, you can skip them and use the classes and their constructors. TRegExObj compile (const std::string &pattern, int flags=0); TRegExMatchObj search (const std::string &pattern, const std::string &search_string, int flags=0); TRegExMatchObj match (const std::string &pattern, const std::string &search_string, int flags=0); class TRegExObj { private: mutable const char * m_errbuf; mutable int m_erroroffset; mutable int m_offsets[MAX_OFFSETS]; mutable pcre* m_pcre; std::string m_pattern; int m_options; mutable bool m_compiled; void InternalCompile() const; void ReleasePattern(); public: TRegExObj(const std::string &pattern, int options=0); TRegExObj(const char *pattern="", int options=0); TRegExObj(const TRegExObj &); TRegExObj& operator = (const TRegExObj &); ~TRegExObj(); void Compile(const std::string &pattern, int options=0); TRegExMatchObj Search(const std::string &str, int options = 0, size_t startpos=0, size_t endpos=std::string::npos) const; TRegExMatchObj Match (const std::string &str, int options = 0, size_t startpos=0, size_t endpos=std::string::npos) const; std::string GetPattern() { return m_pattern; } }; class TRegExMatchObj { private: struct TMatchGroup { int start; int end; std::string value; TMatchGroup(int s=0, int e=0, const std::string& str="") :start(s), end(e), value(str) {} }; std::vector <TMatchGroup> m_Groups; friend class TRegExObj; TRegExMatchObj(int result, int *offsets, const std::string& str); void BuildMatchVector(int result, int *offsets, const std::string& str); public: TRegExMatchObj(); std::string Group (int GroupIndex = 0); size_t GroupCount(); int Start (int GroupIndex = 0); int End (int GroupIndex = 0); bool Matched (); }; } #endif //------------------------------------------------------------------------------ //------------------------------------------------------------------------------ //Listing 7.3: pcre/pcre-wrapper/main.cpp #include <cstring> #include <algorithm> #include <iostream> #include "pcreobj.h" using namespace std; using namespace re; bool Test(const char* str) { const char *policy_pattern = "(MP|CM)(\\d{1,8})$"; TRegExObj expression(policy_pattern); TRegExMatchObj match; match = expression.Match(str); bool result = match.Matched(); cout << "TRegExObj regex call:" ; if (result) cout << "match" << endl; else cout << "not a match" << endl; if(result) { int group_count = match.GroupCount (); for (size_t j = 0; j<group_count; ++j) cout << "match.Group(" << j << ") = " << match.Group(j) << endl; } cout << endl; return result; } int main() { Test("MP001234"); // match Test("CM77123"); // match Test("xMP12"); // no match, leading x (note use of PCRE_ANCHORED) Test("MP12x"); // no match, extra stuff at end) Test("MP123456789"); // no match, too many letters (extra stuff at end) Test("foobar"); return 0; } //------------------------------------------------------------------------------ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
All rights reserved. |