NAME HTML::Microformats - parse microformats in HTML SYNOPSIS use HTML::Microformats; my $doc = HTML::Microformats ->new_document($html, $uri) ->assume_profile(qw(hCard hCalendar)); print $doc->json(pretty => 1); use RDF::TrineShortcuts qw(rdf_query); my $results = rdf_query($sparql, $doc->model); DESCRIPTION The HTML::Microformats module is a wrapper for parser and handler modules of various individual microformats (each of those modules has a name like HTML::Microformats::Format::Foo). The general pattern of usage is to create an HTML::Microformats object (which corresponds to an HTML document) using the "new_document" method; then ask for the data, as a Perl hashref, a JSON string, or an RDF::Trine model. Constructor "$doc = HTML::Microformats->new_document($html, $uri, %opts)" Constructs a document object. $html is the HTML or XHTML source (string) or an XML::LibXML::Document. $uri is the document URI, important for resolving relative URL references. %opts are additional parameters; currently only one option is defined: $opts{'type'} is set to 'text/html' or 'application/xhtml+xml', to control how $html is parsed. Profile Management HTML::Microformats uses HTML profiles (i.e. the profile attribute on the HTML
element) to detect which Microformats are used on a page. Any microformats which do not have a profile URI declared will not be parsed. Because many pages fail to properly declare which profiles they use, there are various profile management methods to tell HTML::Microformats to assume the presence of particular profile URIs, even if they're actually missing. "$doc->profiles" This method returns a list of profile URIs declared by the document. "$doc->has_profile(@profiles)" This method returns true if and only if one or more of the profile URIs in @profiles is declared by the document. "$doc->add_profile(@profiles)" Using "add_profile" you can add one or more profile URIs, and they are treated as if they were found on the document. For example: $doc->add_profile('http://microformats.org/profile/rel-tag') This is useful for adding profile URIs declared outside the document itself (e.g. in HTTP headers). Returns a reference to the document. "$doc->assume_profile(@microformats)" For example: $doc->assume_profile(qw(hCard adr geo)) This method acts similarly to "add_profile" but allows you to use names of microformats rather than URIs. Microformat names are case sensitive, and must match HTML::Microformats::Format::Foo module names. Returns a reference to the document. "$doc->assume_all_profiles" This method is equivalent to calling "assume_profile" for all known microformats. Returns a reference to the document. Parsing Microformats Generally speaking, you can skip this. The "data", "json" and "model" methods will automatically do this for you. "$doc->parse_microformats" Scans through the document, finding microformat objects. On subsequent calls, does nothing (as everything is already parsed). Returns a reference to the document. "$doc->clear_microformats" Forgets information gleaned by "parse_microformats" and thus allows "parse_microformats" to be run again. This is useful if you've modified added some profiles between runs of "parse_microformats". Returns a reference to the document. Retrieving Data These methods allow you to retrieve the document's data, and do things with it. "$doc->objects($format);" $format is, for example, 'hCard', 'adr' or 'RelTag'. Returns a list of objects of that type. (If called in scalar context, returns an arrayref.) Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details. "$doc->all_objects" Returns a hashref of data. Each hashref key is the name of a microformat (e.g. 'hCard', 'RelTag', etc), and the values are arrayrefs of objects. Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details. "$doc->json(%opts)" Returns data roughly equivalent to the "all_objects" method, but as a JSON string. %opts is a hash of options, suitable for passing to the JSON module's to_json function. The 'convert_blessed' and 'utf8' options are enabled by default, but can be disabled by explicitly setting them to 0, e.g. print $doc->json( pretty=>1, canonical=>1, utf8=>0 ); "$doc->model" Returns data as an RDF::Trine::Model, suitable for serialising as RDF or running SPARQL queries. "$object->serialise_model(as => $format)" As "model" but returns a string. "$doc->add_to_model($model)" Adds data to an existing RDF::Trine::Model. Returns a reference to the document. Utility Functions "HTML::Microformats->modules" Returns a list of Perl modules, each of which implements a specific microformat. "HTML::Microformats->formats" As per "modules", but strips 'HTML::Microformats::Format::' off the module name, and sorts alphabetically. WHY ANOTHER MICROFORMATS MODULE? There already exist two microformats packages on CPAN (see Text::Microformat and Data::Microformat), so why create another? Firstly, HTML::Microformats isn't being created from scratch. It's actually a fork/clean-up of a non-CPAN application (Swignition), and in that sense predates Text::Microformat (though not Data::Microformat). It has a number of other features that distinguish it from the existing packages: * It supports more formats. HTML::Microformats supports hCard, hCalendar, rel-tag, geo, adr, rel-enclosure, rel-license, hReview, hResume, hRecipe, xFolk, XFN, hAtom, hNews and more. * It supports more patterns. HTML::Microformats supports the include pattern, abbr pattern, table cell header pattern, value excerpting and other intricacies of microformat parsing better than the other modules on CPAN. * It offers RDF support. One of the key features of HTML::Microformats is that it makes data available as RDF::Trine models. This allows your application to benefit from a rich, feature-laden Semantic Web toolkit. Data gleaned from microformats can be stored in a triple store; output in RDF/XML or Turtle; queried using the SPARQL or RDQL query languages; and more. If you're not comfortable using RDF, HTML::Microformats also makes all its data available as native Perl objects. BUGS Please report any bugs to