Added Cyg-Win

This commit is contained in:
Frank Harris 2026-06-06 18:46:40 -04:00
parent 82cbc206eb
commit 413c315806
10586 changed files with 3806249 additions and 0 deletions

View file

@ -0,0 +1,911 @@
Revision history for Perl extension XML::Parser.
2.59 2026-05-20 (by Todd Rinaldo)
Fixes:
- PR #269 GH #268 Recognize blessed glob handles (e.g. IO::String)
in Expat::parse. The input-detection logic already handled
IO::Handle subclasses, unblessed GLOB refs, bare globs, and
bareword filehandle names but missed blessed globs that don't
inherit from IO::Handle (such as IO::String), silently
stringifying them and feeding the stringification to ParseString.
Add a Scalar::Util::reftype check so blessed GLOB references are
treated like any other glob handle
Maintenance:
- Add IO::String to the cpanfile so CI exercises the
blessed-glob-handle code path covered by PR #269
2.58 2026-04-23 (by Todd Rinaldo)
Fixes:
- PR #260 Prevent element-name SV leak when Start or End handlers
die: wrap the call_sv in ENTER/SAVEFREESV/LEAVE so an exception
thrown from a handler no longer leaks one SV per call. Audited
all 19 XS callbacks — startElement and endElement were the only
ones with non-mortal SVs across a call_sv boundary
- PR #259 Add NULL check for GvIOp in XML_Do_External_Parse to
prevent a segfault when an ExternEnt handler returns an
unopened filehandle. GvIOp returns NULL for a glob that has
never been opened; both call sites previously passed the NULL
straight to newRV_inc
- PR #258 Release the parser when an Init or Final handler dies.
The release() call used to be skipped on exception, leaving a
circular reference through self_sv so DESTROY never ran and the
parser leaked permanently
- PR #257 Free doctype_sysid during normal parser teardown and
NULL self_sv after release. Every parse of a document with a
<!DOCTYPE ... SYSTEM "..."> declaration previously leaked the
system-id string on the non-error path (free_cbv already freed
it on the error path)
- PR #255 Use bare return instead of "return undef" in
ContentModel::children() and expand_ns_prefix(). "return undef"
in list context produces (undef) — a one-element list — which
silently broke callers checking @result for emptiness
- PR #246 Use three-argument open in file_ext_ent_handler so that
pipe characters and IO-mode prefixes in external-entity SYSTEM
identifiers can never be interpreted by Perl's two-argument
open. The existing regex check is now defense-in-depth rather
than the sole protection
- PR #242 Add NULL-after-allocation checks with cascading cleanup
to the three New() calls in LoadEncoding, consistent with the
pattern established for XML_ParserCreate_MM in PR #204
Improvements:
- PR #267 Address CPANTS Kwalitee issues: add =head1 LICENSE to
Parser.pm, add use warnings to Expat.pm, add provides to
META_MERGE (using MM->parse_version() to avoid hardcoding), and
add SECURITY.md and CONTRIBUTING.md
- PR #265 Fix Doctype handler Internal parameter documentation in
Parser.pm — the XS code pushes PL_sv_yes/PL_sv_no (a boolean
indicating whether an internal subset exists), not the subset
string the POD claimed. Also correct the DoctypeFin parameter
label from (Parser) to (Expat) and a minor Expat.pm POD typo
- PR #264 Add use strict and use warnings to Makefile.PL and
Expat/Makefile.PL, and convert $expat_libpath / $expat_incpath
from bare globals to lexicals
- PR #262 Modernize META resources to CPAN Meta spec v2 —
structured bugtracker and repository entries so MetaCPAN and
CPAN tooling can extract richer information (separate git URL,
web URL, and tracker type)
- PR #256 Add use warnings to Parser.pm and all five Style
modules. Expat.pm is intentionally excluded and already
documents why (it uses int() on strings in namespace methods)
- PR #254 Improve const-correctness in Expat.xs: propagate
const char * through newUTF8SVpv, newUTF8SVpvn, and
append_error, and drop 30+ now-unnecessary (char *) casts. No
functional change — identical generated code
- PR #251 Add a Codecov coverage badge to README.md alongside the
existing CI badge
- PR #250 Document the Codecov coverage integration in CLAUDE.md,
including the two flags (perl via Devel::Cover, xs via
gcov/lcov) and a link to the dashboard
- PR #247 Add a Devel::Cover code-coverage CI job that measures
both Perl and XS/C coverage (via gcc --coverage + lcov) and
uploads to Codecov with separate perl/xs flags
- PR #241 Add a SECURITY section to Parser.pm POD documenting the
BillionLaughsAttackProtection*, AllocTracker*, and
ReparseDeferralEnabled options, with cross-references from the
new() option list
Maintenance:
- PR #266 Add 22 tests in t/expat_guards.t covering Expat.pm
input validation (setHandlers type/arity checks), parse-state
guards, and reference-exception preservation
- PR #263 Upgrade cross-platform-actions from v0.32 to v1 in the
BSD (FreeBSD, OpenBSD, NetBSD) CI jobs
- PR #261 Add 4 missing test files to MANIFEST via make manifest
and extend MANIFEST.SKIP with standard exclusions for build
artifacts (blib/, *.o, *.so, *.bs, *.c, cover_db/, .DS_Store,
Makefile.old) so future regenerations stay clean
- PR #253 Add missing use strict and use warnings to 9 test files
so the whole suite is consistent, and fix an undeclared
$parser in t/file.t surfaced by the new strictures
- PR #252 Upgrade actions/checkout from v4 to v6 in the release
workflow (the testsuite workflow was already upgraded)
- PR #249 Add t/expat_xs_coverage.t with 26 tests targeting
previously-uncovered paths in Expat.xs (93% → 95% line
coverage), focused on skip_until suspend/resume, namespace
cleanup in finish(), and external-entity edge cases
- PR #248 Add t/coverage_gaps.t with 31 tests for previously
uncovered Perl code paths identified via Devel::Cover — Debug
and Stream style Proc/PI handlers, Expat direct parse methods,
ContentModel MIXED asString, and security-API argument
validation. Subroutine coverage reaches 100% across all modules
- PR #245 Add 16 targeted Stream_Delimiter boundary tests that
exercise the XS parse_stream delimiter detection logic directly
with small controlled documents (t/stream.t only exercised it
against one large sample file)
- PR #244 Add t/parser_api.t covering the XML::Parser API
surface: setHandlers return and croak semantics, parsefile Base
save/restore (including on error), parser reuse, parse return
values in scalar and list context, Init handler invocation,
and Pkg defaulting
2.57 2026-04-07 (by Todd Rinaldo)
Fixes:
- PR #235 GH #234 Fix PE parsing regression that broke XML::Twig DTD
round-tripping: defer parameter entity parsing activation to when a
declaration handler is registered, preventing PE references from
being consumed by the empty sub-parser instead of reaching the
Default handler
- PR #232 GH #47 Add regression tests for post-root character data
routing and add ppport.h for XS backward compatibility macros
- PR #229 Add NULL check after XML_ParserCreate_MM to prevent
segfault on memory allocation failure
- PR #225 Replace deprecated indirect object syntax in
LWPExternEnt.pl (new URI(...) -> URI->new(...), etc.) for forward
compatibility with Perl 5.36+ where 'use v5.36' disables the
indirect feature. Also add 'use strict' to the file
Improvements:
- PR #233 Add downstream CI testing for XML::XPath, XML::DOM, and
XML::Twig to catch breakage in major consumers before release
Maintenance:
- PR #239 Convert t/char_end_doc.t to Test::More; all 57 test files
now consistently use Test::More
- PR #238 Expand CLAUDE.md with project conventions for generated
files, releases, and testing
- PR #231 Add test coverage for load_encoding, xml_escape edge
cases, and ContentModel API
- PR #230 Add test coverage for element_index, position_in_context,
specified_attr, and setHandlers
- PR #228 Add test coverage for recognized_string, default_current,
and original_string introspection methods
- PR #227 Add test coverage for context-tracking API (context,
current_element, in_element, within_element, depth)
- PR #226 Add test coverage for parse_start/ExpatNB non-blocking
parser API
- PR #225 Replace indirect object syntax in test files (t/decl.t,
t/parament.t, t/external_ent.t)
- PR #224 Add test coverage for IO control character rejection in
external entity paths
2.56 2026-04-02 (by Todd Rinaldo)
Fixes:
- PR #223 Skip original_string test when expat lacks
XML_CONTEXT_BYTES, fixing false test failures on platforms
where libexpat is compiled without XML_CONTEXT_BYTES (e.g.
DragonFlyBSD system expat)
2.55 2026-04-01 (by Todd Rinaldo)
Fixes:
- PR #221 Fix struct-return ABI mismatch in XML_ExpatVersionInfo()
by parsing the version string instead of using the struct return,
which was corrupted when Perl and libexpat used different struct
return conventions (-fpcc-struct-return vs -freg-struct-return)
- PR #214 GH #211 GH #212 GH #213 Revert defaulthandle char
routing and standalone string changes that broke downstream CPAN
modules (XML-Twig, XML-DOM, XML-XPath)
- PR #216 GH #215 Increase deep_nesting test depth from 600 to
2048 to actually exercise the st_serial_stack reallocation code
path (the GH #39 fix)
- PR #218 Update expat download URL from SourceForge to GitHub in
the "expat not found" error message
Improvements:
- PR #220 GH #215 Add AddressSanitizer CI job for XS memory safety
to catch heap buffer overflows and use-after-free bugs
Maintenance:
- PR #219 Modernize last 4 print-ok test files (astress, namespaces,
stream, skip) to Test::More with descriptive test names
- PR #217 Remove obsolete .travis.yml (project uses GitHub Actions)
2.54 2026-03-27 (by Todd Rinaldo)
Fixes:
- PR #196 Plug XS memory leaks on error paths in Expat.xs
(externalEntityRef, parse_stream, ParserCreate)
- PR #204 Add defensive NULL checks in Expat.xs to prevent crashes
on memory exhaustion and undefined behavior on short input lines
- PR #203 Add explicit package main after inline package declarations
in test files to clarify scope
Improvements:
- PR #207 GH #205 Add GitHub Actions workflow to auto-create GitHub
Releases on tag push, enabling downstream notification via
GitHub's release watch
- PR #209 Update AUTHORS POD in Parser.pm and Expat.pm to reflect
full maintainer history
- PR #210 Add CI badge to POD via =for markdown directive so it
survives README.md regeneration
- Rename README to README.md and regenerate from POD
Maintenance:
- PR #208 Modernize 10 legacy test files from print-ok style to
Test::More (cdata, finish, deep_nesting, xml_escape, partial,
char_end_doc, current_length, combine_chars, utf8_stream, defaulted)
2.53 2026-03-25 (by Todd Rinaldo)
Fixes:
- PR #202 GH #201 Fix detection of bare glob filehandles (*FH) in
Expat::parse; previously only glob references were recognized
Maintenance:
- PR #198 Modernize encoding.t from print-ok style to Test::More
and expand coverage
- PR #197 Modernize styles.t from Test to Test::More and expand
coverage
- Ignore Mac OS metadata files (.DS_Store) in .gitignore
2.52 2026-03-24 (by Todd Rinaldo)
Fixes:
- PR #193 Restrict Subs style to package-local subs only; previously
UNIVERSAL::can() walked the inheritance tree, potentially dispatching
to inherited base class methods on element names like <connect/>
Improvements:
- PR #195 Require perl 5.008 minimum and remove pre-5.008 compat code
(dead polyfills, tied-handle branch, ExtUtils::Liblist guard)
Documentation:
- PR #176 GH #173 Explain why the empty sub-parser parse is needed in
externalEntityRef for parameter entity processing
Maintenance:
- PR #194 Remove dead code: Built_In_Styles hash, $have_File_Spec,
newSVpvn/ERRSV compat guards
- PR #192 GH #190 Log expat version in all CI jobs for easier
diagnosis of platform-specific failures
- Add AI_POLICY.md for transparency
2.51 2026-03-20 (by Todd Rinaldo)
Fixes:
- PR #184 GH #182 Fix compile warnings from Fedora gcc: unhandled enum
value in switch and uninitialized variables in parse_stream
- PR #186 GH #183 Add explicit case labels for XML_CTYPE_EMPTY and
XML_CTYPE_ANY in generate_model switch to fix -Wswitch on BSD/clang
- PR #175 GH #174 Add XML_DTD/XML_GE compile-time guards to AllocTracker
and BillionLaughs XS bindings to fix warnings when expat lacks these flags
- PR #187 GH #185 Require proper declaration in AllocTracker feature
detection to prevent implicit function declaration on BSD
Improvements:
- PR #189 GH #188 Add tests for XS functions modified in PR #184 covering
XML_ErrorString, XML_GetBase/XML_SetBase, and generate_model
Maintenance:
- PR #181 GH #180 Add Fedora 43 container job to CI test matrix
- PR #178 Add BSD testing (FreeBSD, OpenBSD, NetBSD) to CI
- PR #179 Add no-lwp CI job to validate test suite without LWP::UserAgent
- Update MANIFEST
2.49 2026-03-19 (by Todd Rinaldo)
Improvements:
- PR #171 Expose expat library version at runtime via expat_version()
and expat_version_info() class methods on XML::Parser::Expat
- PR #169 GH #168 Expose Expat 2.7.2 AllocTracker security APIs
(AllocTrackerMaximumAmplification, AllocTrackerActivationThreshold)
Maintenance:
- PR #170 Clean up build configuration: remove dead CAPI code from
Makefile.PL and Expat/Makefile.PL, add test dependencies to cpanfile
2.48 2026-03-18 (by Todd Rinaldo)
Fixes:
- GH #39 Fix off-by-one heap buffer overflow in st_serial_stack growth check (CVE-2006-10003)
- GH #64 Fix buffer overflow in parse_stream when filehandle has :utf8 layer (CVE-2006-10002)
- GH #27 Prevent symbol table auto-vivification in Expat::parse
- GH #30 Set UTF-8 flag on sysid in ExternEnt handler and fix Debug style for non-ASCII chars
- GH #36 Prevent position overflow for large files in line/column/error paths
- GH #41 Fix xml_escape to escape all occurrences of quote characters
- GH #44 Fix lexical filehandle handling in ExternEnt handler return values
- GH #45 Clean up compiler warnings in Expat.xs
- GH #47 Fix routing of character data after root element to Char handler
- GH #48 Fix current_byte overflow for large XML files on 32-bit perl
- GH #50 Propagate xpcroak errors in Subs style instead of swallowing them
- GH #53 Fix parameter entity references in internal DTD subset breaking handler dispatch
- GH #65 Support standard LIBS and INC options in Makefile.PL; propagate to Expat/Makefile.PL
- GH #69 Auto-detect multiarch library paths for expat
- GH #72 Localize $_ in Style::Stream to avoid read-only modification
- GH #76 Use system tmpdir for temp files in Devel::CheckLib
- GH #83 Use pkg-config to auto-detect expat in non-standard locations
- GH #90 Improve "Couldn't find your C compiler" error message
- GH #100 Clean up MSVC assertlib .obj files on Windows
- GH #103 Skip -rpath on Mac OS X 10.4 and earlier
- GH #106 Fix freeing of the content model using XML_FreeContentModel
- GH #148 XML-escape attribute values in Stream style default output
- GH #149 Restore Base after parsefile() to prevent context pollution on reuse
- GH #152 Fix SYNOPSIS handler name Characters -> Text in Stream.pm
- GH #153 Fix variable interpolation in xpcarp() and setHandlers() error messages
- GH #157 Restore Perl 5.8 and 5.10 test compatibility
- GH #160 Initialize st_serial_stacksize after allocation in Expat.xs
- GH #162 Replace local $^W=0 with no warnings 'numeric' in Expat.pm
- GH #164 Add missing ENTER/SAVETMPS scope to notationDecl callback
- GH #165 Replace each() with keys() to avoid iterator side effects
- GH #166 Remove no-op study() call in xml_escape
Improvements:
- GH #38 Add G_VOID flag to all void-context perl_call_sv/method/pv calls
- GH #46 Add UseForeignDTD option for documents without DOCTYPE
- GH #49 Add current_length method to XML::Parser::Expat
- GH #54 Add hint about unescaped characters for invalid token errors
- GH #67 Add NoLWP to expat capability probes for consistent skip logic
- GH #70 Enhance parse exceptions with XML context when ErrorContext is set
- GH #71 Move encoding maps from PERL5LIB to File::ShareDir
- GH #73 XMLDecl handler now returns "yes"/"no" for standalone attribute
- GH #101 Make LWP::UserAgent a recommended dependency, not required
- GH #102 Expose expat security APIs: BillionLaughs and ReparseDeferral
- GH #167 Modernize Perl pragmas across modules
Documentation:
- GH #55 Add ERROR HANDLING section and improve parse error documentation
- GH #56 Clarify Char handler splitting behavior with example and docs
- GH #74 Document predefined entity expansion in Tree style
- GH #161 Fix Standalone parameter description in README
Maintenance:
- GH #25 Add Debug style multibyte character regression test
- GH #28 Add tests for globref and lexical filehandle return values in ExternEnt handler
- GH #31 Add encoding tests for windows-1251, koi8-r, windows-1255, and ibm866
- GH #51 Skip external DTD tests when expat lacks parameter entity support
- GH #150 Replace Artistic-2.0 LICENSE with correct Perl dual license
- GH #151 Modernize xpcroak.t from Test.pm to Test::More
- GH #155 Modernize CI workflow inspired by YAML-Syck
- GH #159 Install libexpat1-dev in perl-tester CI containers
- GH #163 Replace defunct Travis CI badge with GitHub Actions
- GH #168 Update META_MERGE URLs to cpan-authors organization
- Integrate Windows into overall CI test run
2.47 2023-12-28 (by Todd Rinaldo)
- #84 use $fh instead of $foo
- #85 Fix typo in documentation
- #89 Devel::CheckLib to from 0.99 -> 1.14
- Devel::CheckLibn 1.16
- #91 POD fix for verbatim text
- #97 Add a LICENSE file
- #94 Don't ship Expat/Makefile
- Various github workflow improvements. Windows is still not working.
2.46 2019-09-24 (by Todd Rinaldo)
- use foreach not for for loops
- produce README.md so travis will show up on github
- remove use vars and switch to our.
- travis-ci testing from 5.8..5.28
- Convert XML::Parser to use 3 arg opens with no barewords.
- Migrate tracker to github
- Switch to XSLoader
- Fix a buffer overwrite in parse_stream()
2.44 2015-01-12 (by Todd Rinaldo)
- RT 99098 - Revert "Add more useful error message on parse to Expat". It breaks
XML::Twig. Calling code will need to do this if it's needed.
- RT 100959 - Add use FileHandle to t/astress.t - Make perl 5.10.0 happy.
2.43 2014-12-11 (by Todd Rinaldo)
- POD patch to man from Debian via Nicholas Bamber
- POD patch from Debian via gregor herrmann.
- Add more useful error message on parse to Expat
- Fix LWP dependency to be LWP::Useragent
- Bump to 2.43 for overdue release to CPAN.
2.42_01 2013-07-12 (by Todd Rinaldo)
- Added instructions to README for OSX
- XS changes: stop using SvPV(string, PL_na)
- Fix documentation typos
2.41 2011-06-01 (by Todd Rinaldo)
- Tests are cleaned. promoting to stable. No changes since 2.40_02
2.40_02 2011-05-31 (by Todd Rinaldo)
- TODO some tests which fail in Free BSD due to improper expat CVE patch
http://www.freebsd.org/cgi/query-pr.cgi?pr=157469
2.40_01 2011-05-24 (by Todd Rinaldo)
- better installation instructions
- Small spelling patches from Debian package - Thanks Nicholas Bamber
- RT 68399 - Upgrade Devel::CheckLib to 0.93 to make it
perl 5.14 compliant - qw()
- RT 67207 - Stop doing tied on globs - Thanks sprout
- RT 31319 - Fix doc links in POD for XML/Parser.pm
2.40 2010-09-16 (by Alexandr Ciornii)
- Add windows-1251.enc, ibm866.enc, koi8-r.enc (Russian)
- Add windows-1255.enc (Hebrew)
- Update iso-8859-7.enc (RT#40712)
- Use Devel::CheckLib
- Better description of expat packages
- Better Perl style in both code and docs
2.36
- Fix for Carp::Heavy bugs
2.35 (mostly by Alexandr Ciornii)
- Works in 5.10 (Andreas J. Koenig)
- Added license in Makefile.PL (Alexandr Ciornii)
- Makefile.PL also searches for expat in C:/lib/Expat-2.0.0 (Alexandr Ciornii)
- No longer uses variable named 'namespace' in Expat.xs (Jeff Hunter)
2.33
- Fixed Tree style (grantm)
- Fixed some non-utf8 stuff in DTDs (patch in XML::DOM tarball)
2.32
- Memory leak fix (Juerd Waalboer).
- Added windows-1252 encoding
- Styles moved to separate .pm files to make loading faster and
ease maintainence
- Don't load IO::Handle unless we really need to
2.31 Tue Apr 2 13:39:51 EST 2002
- Ilya Zakharevich <ilya@math.ohio-state.edu> and
Dave Mitchell <davem@fdgroup.com> both provided patches to
fix problems module had with 5.8.0
- Dave Mitchell also made some UTF-8 related fixes to the test suite.
2.30 Thu Oct 5 12:47:36 EDT 2000
- Get rid of ContentStash global. Not that big a deal looking it up
everytime and gets rid of a potential threading problem.
- Switch to shareable library version of expat from sourceforge
(i.e. no longer include expat source and require that libexpat
be installed)
- Bob Tribit <btribit@traffic.com> demonstrated a fix for problems
in compiling under perl 5.6.0 with 5.005 threading.
- Matt Sergeant <matt@sergeant.org> discovered a typo ('IO::Handler'
instead of 'IO::Handle') in Expat.pm that caused IO::Handle objects
to be treated as strings instead of handles.
- Matt Sergeant also provided a patch to allow tied handles to work
properly in calls to parse.
- Eric Bohlman <ebohlman@netcom.com> reported a failure when
incremental parsing and external parsing were used together.
Need to give explicit package when calling Do_External_Parse
from externalEntityRef otherwise fails when called through ExpatNB.
2.29 Sun May 21 21:19:45 EDT 2000
- In expat, notation declaration handler registration wasn't
surviving through external entity references.
- Chase Tingley <tingley@sundell.net> discovered that text
accumulation in the Stream style wasn't working across processing
instructions and recommended the appropriate fix.
- Jochen Wiedmann <jochen.wiedmann@softwareag.com>, noted that
you couldn't use ExpatNB directly because it wasn't setting
the protective _State_ variable. Now doing this in the
parse_more method of ExpatNB.
- At the suggestion of Grant Hopwood <hopwoodg@valero.com>, now
calling the env_proxy method on the LWP::UserAgent in the LWP
external entity handler when it's created to set any proxies
from environment variables.
- Grant McLean, Matt Sergeant (& others I may have missed) noted that
loading the LWP & URI modules slowed startup of the module, even
if the application didn't need it. The default LWP handler is now
dynamicly loaded (along with LWP & URI modules) the first time an
external entity is referenced. Also provided a NoLWP option to
XML::Parser that forces the file based external entity handler.
- Fixed allocation errors in element declaration patches in expat
- The Expat base method now works, even before expat starts parsing.
- Changed the canonical script to take an optional file argument.
- Enno Derksen <enno@att.com> reported that the attlist handler
was not returning NOTATION type attlist information.
- Michel Rodriguez <mrodrigu@ieee.org>, noted that the constructor
for XML::Parser objects no longer checked for the existence of
applications installed external entity handlers before installing
the default ones.
- Burkhard Meier <burkhard.meier@ixos.de> sent in a fix for
compiler directives in Expat/Makefile.PL for Win32 machines.
A change in 5.6.0 caused the old conditional to fail.
- Forgot to document changes to the Entity declaration handler:
there is an additional "IsParam" argument that indicates whether
or not the entity is a parameter entity. This information is
no longer passed on in the name.
- Ben Low <ben@snrc.uow.edu.au> reported an undefined macro with
version 5.004_04.
2.28 Mon Mar 27 21:21:50 EST 2000
- Junked local (Expat.xs) declaration parsing and patched expat to
handle XML declarations, element declarations, attlist declarations,
and all entity declarations. By eliminating both shadow buffers and
local declaration parsing in Expat.xs, I've eliminated the two most
common sources of serious bugs in the expat interface.
o thus fixed the segfault and parse position bugs reported by
Ivan Kurmanov <iku@fnmail.com>
o and the doctype bug reported by Kevin Lund
<Kevin.Lund@westgroup.com>
o The element declaration handler no longer receives a string,
but an XML::Parser::ContentModel object that represents the
parsed model, but still looks like a string if referred to as
a string. This class is documented in the XML::Parser::Expat
pod under "XML::Parser::ContentModel Methods".
o The doctype declaration handler no longer receives the internal
subset as a string, but in its place a true or undef value
indicating whether or not there is an internal subset. Also,
it's called prior to processing either the internal or external
DTD subset (as suggested by Enno Derksen <enno@att.com>.)
o There is a new DoctypeFin handler that's called after finishing
parsing all of the DOCTYPE declaration, including any internal
or external DTD declarations.
o One bit of lossage is that recognized_string, original_string,
and default_current no longer work inside declaration handlers.
- Added a handler that gets called after parsing external entities:
ExternEntFin. Suggested by Jeff Horner <jhorner@netcentral.net>.
- parsefile, file_ext_ent_handler, & lwp_ext_ent_handler now all
set the base path. This problem has been raised more than once
and I'm not sure to whom credit should be given.
- The file_ext_ent_handler now opens a file handle instead of
reading the entire entity at once.
- Merged patches supplied by Larry Wall to (for perl 5.6 and beyond)
tag generated strings as UTF-8, where appropriate.
- Fixed a bug in xml_escape reported by Jerry Geiger <jgeiger@rios.de>.
It failed when requesting escaping of perl regex meta-characters.
- Laurent Caprani <caprani@pop.multimania.com> reported a bug in the
Proc handler for the Debug style.
- <chocolateboy@usa.net> sent in a patch for the element index
mechanism. I was popping the stack too soon in the endElement fcn.
- Jim Miner <jfm@winternet.com> sent in a patch to fix a warning in
Expat.pm.
- Kurt Starsinic pointed out that the eval used to check for string
versus IO handle was leaving $@ dirty, thereby foiling higher
level exception handlers
- An expat question by Paul Prescod <paul@prescod.net> helped me
see that exeptions in the parse call bypass the Expat release method,
causing memory leaks.
- Mark D. Anderson <mda@discerning.com> noted that calling
recognized_string from the Final method caused a dump. There are
a bunch of methods that should not be called after parsing has
finished. These now have protective if statements around them.
- Updated canonical utility to conform to newer version of Canonical
XML working draft.
2.27 Sat Sep 25 18:26:44 EDT 1999
- Corrected documentation in Parser.pm
- Deal with XML_NS and XML_BYTE_ORDER macros in Expat/Makefile.PL
- Chris Thorman <chris@thorman.com> noted that "require 'URI::URL.pm'"
in Parser.pm was in error (should be "require 'URI/URL.pm'")
- Andrew McNaughton <andrew@scoop.co.nz> noted "use English" and
use of '$&' slowed down regex handling for whole application, so
they were excised from XML::Parser::Expat.
- Work around "modification of read-only value" bug in perl 5.004
- Enno Derksen <enno@att.com> reported that the Doctype handler
wasn't being called when ParseParamEnt was set.
- Now using Version 19990728 of expat, with local patches.
- Got rid of shadow buffer
o thus fixed the error reported by Ashley Sanders
<a.sanders@mcc.ac.uk>
o and removed ExpatNB limitations that Peter Billam
<music@pjb.com.au> noted.
- Vadim Konovalov <vkonovalov@lucent.com> had a problem compiling
for multi-threading that was fixed by changing Perl_sv_setsv to
sv_setsv.
- Added new Expat method: skip_until(index)
- Backward incompatible change to method xml_escape: to get former
behavior use $xp->xml_escape($string, '>', ...)
- Added utility, canonical, to samples
2.26 Sun Jul 25 19:06:41 EDT 1999
- Ken Beesley <ken.beesley@xrce.xerox.com> discovered that
declarations in the external subset are not sent to registered
handlers when there is no internal subset.
- Fixed parse_dtd to work when entity values or attribute defaults
are so large that they might be broken across multiple calls to
the default handler.
- For lwp_ext_ent_handler, use URI::URL instead of URI so that old
5.004 installations will work with it.
2.25 Fri Jul 23 06:23:43 EDT 1999
- Now using Version 1990709 of expat. No local patches.
- Numerous people reported a SEGV problem when running t/cdata
on various platforms and versions of perl. The problem was
introduced with the setHandlers change. In some cases an
un-initialized value was being returned.
- Added an additional external entity handler, lwp_ext_ent_handler,
that deals with general URIs. It is installed instead of the
"file only" handler if the LWP package is installed.
2.24 Thu Jul 8 23:05:50 EDT 1999
- KangChan Lee <dolphin@comeng.chungnam.ac.kr> supplied the
EUC-KR encoding map.
- Enno Derksen <enno@att.com> forwarded reports by Jon Eisenzopf
<eisen@pobox.com> and Stefaan Onderbeke <onderbes@bec.bel.alcatel.be>
about a core dump using XML::DOM. This was due to a bug in the
prolog parsing part of XML::Parser.
- Loic Dachary <loic@ceic.com> discovered that changing G_DISCARD to
G_VOID introduced a small memory leak. Changed G_VOID back to
G_DISCARD.
- As suggested by Ben Holzman <bholzman@earthlink.net>, the
setHandlers methods of both Parser and Expat now return lists that
consist of type, handler pairs that correspond to the input, but
the handlers returned are the ones that were in effect prior to
the call.
- Now using Version 19990626 of expat with a local patch (provided
by James Clark.)
- Added option ParseParamEnt. When set to a true value, parameter
entities are parsed and the external DTD is read (unless standalone
set to "Yes" in document).
2.23 Mon Apr 26 21:30:28 EDT 1999
- Fixed a bug in the ExpatNB class reported by Gabe Beged-Dov
<begeddov@jfinity.com>. The ErrorMessage attribute wasn't
being initialized for ExpatNB. This should have been done in
the Expat constructor.
- Applied patch provided by Nathan Kurz <nate@valleytel.net> to
fix more perl stack manipulation errors in Expat.xs.
- Applied another patch by Nathan to change perl_call_sv flag
from G_DISCARD to G_VOID for callbacks, which helps performance.
- Murata Makoto <murata@apsdc.ksp.fujixerox.co.jp> reported a
problem on Win32 platforms that only showed up when UTF-16 was
being used. The needed call to binmode was added to the parsefile
methods.
- Added documentation for release method that was added in release
2.20 to Expat pod. (Point raised by <mookie@undef.com>)
- Now using Version 19990425 of expat. No local patches.
- Added specified_attr method and made ineffective the is_defaulted
method.
2.22 Sun Apr 4 11:47:25 EDT 1999
- Loic Dachary <loic@ceic.com> reported a core dump with a small
file with a comment that wasn't properly closed. Fixed in expat
by updating positionPtr properly in final call of XML_Parse.
(Reported to & acknowledged by James Clark.)
- Made more fixes to Expat.xs position calculation.
- Loic Dachary <loic@ceic.com> provided patches for fixing a
memory growth problem with large documents. (Garbage collection
wasn't happening frequently enough.)
- As suggested by Gabe Beged-Dov <begeddov@jfinity.com>, added
a non-blocking parse mechanism:
- Added parse_start method to XML::Parser, which returns a
XML::Parser::ExpatNB object.
- Added XML::Parser::ExpatNB class, which is a subclass of
Expat and has the additional methods parse_more & parse_done
- Made some performance tweaks as suggested by performance thread
on perl-xml discussion list. [With negligible results]
- Tried to clarify Tree style structure in Parser pod
2.21 Sun Mar 21 17:42:04 EST 1999
- Warren Vik <wvik@whitebarn.com> provided patches for a bug
introduced with the is_defaulted method. It manifested itself
by bogusly reporting duplicate attributes.
- Now using latest expat from ftp://ftp.jclark.com/pub/test/expat.zip,
Version 19990307. (Plus any patches in Expat/expat.patches.)
- As suggested by Tim Bray, added an xml_escape method to
Expat.
- Murray Nesbitt <murray@activestate.com> had build problems
on Win32 that were solved by swapping 2 include files in
Expat.xs
- Added following Expat namespace methods:
new_ns_prefixes
expand_ns_prefix
current_ns_prefixes
- Fixed memory handling in recognized_string method to get rid
of "Attempt to free unreferenced scalar" bug.
2.20 Sun Feb 28 15:35:52 EST 1999
- Fixed miscellaneous bugs in xmlfilter.
- In the default external entity handler, prepend the base only
for relative URLs.
- Chris Nandor <pudge@pobox.com> provided patches for building
on Macintosh.
- As suggested by Matt Sergeant <Matthew.Sergeant@eml.ericsson.se>,
added the finish method to Expat.
- Matt also provided a fix to a bug he discovered in the Streams
style.
- Fixed a parse position bug reported by Enno Derksen <enno@att.com>
that was affecting both original_string and position_in_context.
- Fixed a gross memory leak reported by David Megginson,
<david@megginson.com>: there was a circular reference to the Expat
object and the internal end handler for context was not freeing
element names after they were removed from the context stack.
- Now using expat Version 19990109
(Plus any patches in Expat/expat.patches)
- Added is_defaulted method to Expat to tell if an attribute
was defaulted. (Requested by Enno Derksen for XML::DOM.)
- Matt Sergeant <Matthew.Sergeant@eml.ericcson.se> reported that
the XML::Parser parse methods weren't propagating array context
to the Final handler. Now they are.
- Fixed more memory leaks (again reported by David Megginson).
The SVs pointing to the handlers weren't being reclaimed when
the callback vector was freed.
- Added the element_index method to Expat.
2.19 Sun Jan 3 11:23:45 EST 1999
- When the recognized string is long enough, expat uses multiple
calls to reportDefault. Fixed recString handler in Expat.xs to
deal with this properly.
- Added original_string method to Expat. This returns the untranslated
string (i.e. original encoding) that caused current event.
- Alberto Accomazzi <alberto@cfa0.harvard.edu> sent in more patches
for perl5.005_54 incompatibilities.
- Alberto also fingered a nasty memory bug in Expat.xs that arose
sometimes when you registered a declaration handler but no
default handler. It would give you a "Not a CODE reference"
error in a place that wasn't using any CODE references.
- <schinder@pobox.com> reported a problem with compiling expat
on a Sun 4 due to non-exsitance of memmove on that OS. Provided
a workaround in Makefile.PL
- Now using expat Version 19981231 from James Clark's test directory.
- Made patch to this version in order to support original_string
(see Expat/expat.patches.)
- Added CdataStart and CdataEnd handlers to expat.
2.18 Sun Dec 27 07:39:23 EST 1998
- Alberto Accomazzi <alberto@cfa0.harvard.edu> pointed out that
the DESTROY sub in the new XML::Parser::Encinfo package was
pointing to the wrong package for calling FreeEncoding.
- Tarang Kumar Patel <mombasa@ptolemy.arc.nasa.gov> reported
the mis-declaration of an integer as unsigned in the
convert_to_unicode function in Expat.xs.
- Glenn R. Kronschnabl <grk@arlut.utexas.edu> reported a problem
with ExternEnt handlers when using parsefile. Turned out to be
an unmatched ENTER; SAVETMPS pair that screwed up the Perl stack.
- Tom Hughes <tom@compton.demon.co.uk> reported that the fix I put
in for the swith to PL_sv.. names failed with 5.0005_54, since
these became real variables instead of macros. Switched to just
checking the PATCHLEVEL macro.
- Yoshida Masato <yoshidam@inse.co.jp> provided the EUC-JP encodings
(the corresponding XML files are in XML::Encoding 1.01 or later.)
- With the advice of MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>,
removed the Shift_JIS encoding and replaced it with 4 variations
he provided. He also provided an explanatory message.
- Added the recognized_string method to Expat, deprecating
default_current.
- Now using expat Version 19981122 from James Clark's test directory
(this fixes another bug with external entity reference handlers)
- Added a default external entity handler that only accesses file:
based URLs.
2.17 Sun Dec 13 17:39:58 EST 1998
- Replaced uses of malloc, realloc, and free with New, Renew,
and Safefree respectively
- In Expat.pm, fixed methods in_element and within_element to
work correctly with namespaces.
- xmlfilter - Substitute quoted equivalents for special characters
in attribute values.
- position_in_context was off by one line when position was at
the end of line.
- For the context methods in Expat.pm, do the right thing when
the context list is empty.
- Added methods xpcroak and xpcarp to Expat.
- Alberto Accomazzi <alberto@cfa0.harvard.edu> noted that perl
releases 5.005_5* (the pre 5.006 development versions) won't
accept sv_undef (and related constants) anymore and we have
to switch to PL_sv_...
- Alberto also reported a warning in the newer versions of
IO::Handle about input_record_separator not being treated on
a per-handle basis.
- Fixed bug that Jon Udell <udell@top.monad.net> reported in
Stream style: Text handler most of the time didn't see proper
context.
- Added XML::Parser::Expat::load_encoding function and support
for external encodings.
2.16 Tue Oct 27 22:27:33 EST 1998
- Fixed bug reported by Enno Derksen <enno@att.com>:
Now treats parameter entity declarations correctly. The entity
handler sees the name beginning with '%' if it's a parameter
entity declaration.
- Nigel Hutchison <nwoh@software-ag.de> pointed out that stream.t
wasn't portable off Unix systems. Replaced with portable version.
- Fixed bug reported by Enno Derksen <enno@att.com>:
XML Declaration was firing off both XMLDecl handler *and* Default
handler.
- Added option NoExpand to Expat to turn off expansion of entity
references when a default handler is set.
2.15 Tue Oct 20 14:50:11 EDT 1998
- In Expat's parse method, account for undefined previous
record separators.
- Simplify a couple of Expat methods.
- Re-ordered Changes entries to put latest changes first.
- In XML::Parser::new, set Handlers if not already set
- New Handler (XMLDecl) for handling XML declarations
- New Handler (Doctype) for handling DOCTYPE declarations
- New Handler (Entity) for handling ENTITY declarations in
the internal subset.
- New Handler (Element) for handling ELEMENT declarations in
the internal subset.
- New Handler (Attlist) for handling ATTLIST declarations in
the internal subset.
- Documented new handlers
- Added t/decl.t to test new handlers
2.14 Sun Oct 11 22:17:15 EDT 1998
- Always use method calls for streams.
- Use perl's input_record_separator to find delimiter (i.e. each
"line" is an entire XML doc with delimiter appended)
- Deal with line being longer than buffer.
2.13 Thu Oct 8 16:58:39 EDT 1998
- Fixed a major oops in Expat.xs where I was trying to decrement
a refcnt on an unallocated SV, leading to a segment violation.
(Why did this show up on HPUX but not Linux?)
2.12 Thu Oct 8 00:05:10 EDT 1998
- Incorporated fix to t/astress.t from <fletch@phydeaux.org> (Mike
Fletcher).
- Change to xmlstats from <dblack@candle.superlink.net> (David
Alan Black)
- Access Handlers_Setters in Expat and Handler_Types in Parser
through object reference (following admonition in perltoot
about class data.)
- Added Stream_Delimiter option to Expat.
- In the parse_stream function in Expat.xs, if we either have a
Stream_Delimiter or if there's no file descriptor, use method
calls instead. For Stream_Delimiter in particular, the function
now uses the getline method so it can check for the delimiter
without consuming stuff past the delimiter from the stream.
2.11 Sun Oct 4 22:15:53 EDT 1998
- Swapped out local patch for expat and swapped in James Clark's
patch.
- Pass on all Parser attributes (other than those excluded by
Non_Expat_Options) to the instance of Expat created at parse time.
- New method for Expat: generate_ns_name
- Split test.pl into t/*.t and change Makefile.PL so we don't do a
useless descent into Expat subdir for testing.
- Stop the numeric warning for eq_name and namespace method.
2.10 Fri Sep 25 18:36:46 EDT 1998
- Uses expat Version 19980924
(with local patch - see Expat/expat/xmlparse/xmlparse.c.diff)
- Use newSVpvn when PERL_VERSION >= 5.005
- Completed xmlfilter
- Added support for namespace processing:
o Namespaces option to XML::Parser and XML::Parser::Expat
o Two new methods in Expat:
namespace - to return namespace associated with name
eq_name - compare 2 names for equality across namespaces.
- Use expat's new SetDefaultHandlerExpand instead of SetDefaultHandler
so that entity expansion may continue even if the default handler
is set.
- Moved test.pl back up main level and changed to work with XML::Parser
- Added tests for namespaces
2.09 Fri Sep 18 10:33:38 EDT 1998
- Fixed errors that caused -w to fret in XML::Parser.
- Fixed depth method in XML::Parser::Expat
- There were a few places in Expat.xs where garbage strings may
have been returned due to the expat library giving us zero-length
strings. Fixed by using a local version of newSVpv where length
means length, even when zero.
- The default handler setter in Expat.xs, was inappropriately setting
cbv->dflt_sv when there was a null handler.
2.08 Thu Sep 17 11:47:13 EDT 1998
- Make XML::Parser higher-level re-usable parser objects. Old object
now becomes XML::Parser::Expat.
- The XML::Parser object now supports the style mechanism very close
to that in the 1.0 version.
2.07 Wed Sep 9 11:03:43 EDT 1998
- Added some samples (xmlcomments & xmlstats)
- Now requires 5.004 (due to sv_catpvf)
- Changed Makefile.PL to allow automatic manification
- Added a test that reads xml spec (to check buffer boundary errors)
2.06 Tue Sep 1 10:40:41 EDT 1998
- Fixed the methods current_line, current_byte, and current_column
- Added some tests
2.05 Mon Aug 31 15:29:42 EDT 1998
- Made Makefile.PL changes suggested by Murray Nesbitt
<murray@ActiveState.com> to support building on Win32
and for making PPM binaries.
- Added method parse
- Changed parsestring and parsefile to use new parse method
- Deprecated parsestring method
- Improved error handling in the ExternEnt handler
2.04 Wed Aug 26 13:25:01 EDT 1998
- Uses expat Version 1.0 of August 14, 1998
- Some document changes
- Changed dist section in Makefile.PL
- Added ExternEnt handler
- Added tests for ExternEnt
2.03 Fri Aug 21 17:19:26 EDT 1998
- Changed InitEncoding to ProtocolEncoding. Default to none.
Pass null string to expat's ParserCreate when there is no
ProtocolEncoding.
- Fixed bug in parsefile & parsestring where they were referring
to an ErrorContext *method* instead of a field.
- Fixed position_in_context bugs:
-- 'last' in do {} while ();
-- insert newline before pointer when no following newline
in buffer.
- Added some additional tests
2.02 Thu Aug 20 14:05:08 EDT 1998
- Fixed parsefile problem reported by
"Robert Hanson" <robertha@zenweb.com>, using a modification of
his suggested fix.
- Responded to problem reported by
Bart Schuller <schuller+perl-xml@lunatech.com>
by pre-expanding parts of the XML_UPD macro to avoid confusing
some versions of gcc.
- Changed the constructor to take the option InitEncoding, which
gets passed to the ParserCreate call. When not given, defaults
to UTF-8.
- Added method position_in_context
- Added Constructor option ErrorContext and added reporting of
errors in context.
2.01 Wed Aug 19 11:42:42 EDT 1998
- Added methods:
default_current, base, current_line, current_column,
current_byte, context
- Added some tests
- parsestring and parsefile now croak if they're re-used
- Filled in some documentation
2.00 Mon Aug 17 12:01:33 EDT 1998
- repackaged with James Clark's most recent expat
- changed to an API closer to expat
1.00 March 1998
- Larry Wall's original version

View file

@ -0,0 +1,341 @@
XML::Parser
Copyright (c) 1998-2000 Larry Wall and Clark Cooper.
All rights reserved.
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
This means you may choose to use either:
a) the GNU General Public License, Version 1, February 1989
(see Copying below)
OR
b) the "Artistic License" (see Artistic below)
=======================================================================
The "Artistic License"
Preamble
The intent of this document is to state the conditions under which a
Package may be copied, such that the Copyright Holder maintains some
semblance of artistic control over the development of the package,
while giving the users of the package the right to use and distribute
the Package in a more-or-less customary fashion, plus the right to make
reasonable modifications.
Definitions:
"Package" refers to the collection of files distributed by the
Copyright Holder, and derivatives of that collection of files
created through textual modification.
"Standard Version" refers to such a Package if it has not been
modified, or has been modified in accordance with the wishes
of the Copyright Holder as specified below.
"Copyright Holder" is whoever is named in the copyright or
copyrights for the package.
"You" is you, if you're thinking about copying or distributing
this Package.
"Reasonable copying fee" is whatever you can justify on the
basis of media cost, duplication charges, time of people involved,
and so on. (You will not be required to justify it to the
Copyright Holder, but only to the computing community at large
as a market that must bear the fee.)
"Freely Available" means that no fee is charged for the item
itself, though there may be fees involved in handling the item.
It also means that recipients of the item may redistribute it
under the same conditions they received it.
1. You may make and give away verbatim copies of the source form of the
Standard Version of this Package without restriction, provided that you
duplicate all of the original copyright notices and associated disclaimers.
2. You may apply bug fixes, portability fixes and other modifications
derived from the Public Domain or from the Copyright Holder. A Package
modified in such a way shall still be considered the Standard Version.
3. You may otherwise modify your copy of this Package in any way, provided
that you insert a prominent notice in each changed file stating how and
when you changed that file, and provided that you do at least ONE of the
following:
a) place your modifications in the Public Domain or otherwise make them
Freely Available, such as by posting said modifications to Usenet or
an equivalent medium, or placing the modifications on a major archive
site such as uunet.uu.net, or by allowing the Copyright Holder to include
your modifications in the Standard Version of the Package.
b) use the modified Package only within your corporation or organization.
c) rename any non-standard executables so the names do not conflict
with standard executables, which must also be provided, and provide
a separate manual page for each non-standard executable that clearly
documents how it differs from the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
4. You may distribute the programs of this Package in object code or
executable form, provided that you do at least ONE of the following:
a) distribute a Standard Version of the executables and library files,
together with instructions (in the manual page or equivalent) on where
to get the Standard Version.
b) accompany the distribution with the machine-readable source of
the Package with your modifications.
c) give non-standard executables non-standard names, and clearly
document the differences in manual pages (or equivalent), together
with instructions on where to get the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
5. You may charge a reasonable copying fee for any distribution of this
Package. You may charge any fee you choose for support of this
Package. You may not charge a fee for this Package itself. However,
you may distribute this Package in aggregate with other (possibly
commercial) programs as part of a larger (possibly commercial) software
distribution provided that you do not advertise this Package as a
product of your own. You may embed this Package's interpreter within
an executable of yours (by linking); this shall be construed as a mere
form of aggregation, provided that the complete Standard Version of the
interpreter is so embedded.
6. The scripts and library files supplied as input to or produced as
output from the programs of this Package do not automatically fall
under the copyright of this Package, but belong to whoever generated
them, and may be sold commercially, and may be aggregated with this
Package. If such scripts or library files are aggregated with this
Package via the so-called "undump" or "unexec" methods of producing a
binary executable image, then distribution of such an image shall
neither be construed as a distribution of this Package nor shall it
fall under the restrictions of Paragraphs 3 and 4, provided that you do
not represent such an executable image as a Standard Version of this
Package.
7. C subroutines (or comparably compiled subroutines in other
languages) supplied by you and linked into this Package in order to
emulate subroutines and variables of the language defined by this
Package shall not be considered part of this Package, but are the
equivalent of input as in Paragraph 6, provided these subroutines do
not change the language in any way that would cause it to fail the
regression tests for the language.
8. Aggregation of this Package with a commercial distribution is always
permitted provided that the use of this Package is embedded; that is,
when no overt attempt is made to make this Package's interfaces visible
to the end user of the commercial distribution. Such use shall not be
construed as a distribution of this Package.
9. The name of the Copyright Holder may not be used to endorse or promote
products derived from this software without specific prior written permission.
10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The End
=======================================================================
GNU GENERAL PUBLIC LICENSE
Version 1, February 1989
Copyright (C) 1989 Free Software Foundation, Inc.
<https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The license agreements of most software companies try to keep users
at the mercy of those companies. By contrast, our General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. The
General Public License applies to the Free Software Foundation's
software and to any other program whose authors commit to using it.
You can use it for your programs, too.
When we speak of free software, we are referring to freedom, not
price. Specifically, the General Public License is designed to make
sure that you have the freedom to give away or sell copies of free
software, that you receive source code or can get it if you want it,
that you can change the software or use pieces of it in new free
programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of a such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must tell them their rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any program or other work which
contains a notice placed by the copyright holder saying it may be
distributed under the terms of this General Public License. The
"Program", below, refers to any such program or work, and a "work based
on the Program" means either the Program or any work containing the
Program or a portion of it, either verbatim or with modifications. Each
licensee is addressed as "you".
1. You may copy and distribute verbatim copies of the Program's source
code as you receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice and
disclaimer of warranty; keep intact all the notices that refer to this
General Public License and to the absence of any warranty; and give any
other recipients of the Program a copy of this General Public License
along with the Program. You may charge a fee for the physical act of
transferring a copy.
2. You may modify your copy or copies of the Program or any portion of
it, and copy and distribute such modifications under the terms of Paragraph
1 above, provided that you also do the following:
a) cause the modified files to carry prominent notices stating that
you changed the files and the date of any change; and
b) cause the whole of any work that you distribute or publish, that
in whole or in part contains the Program or any part thereof, either
with or without modifications, to be licensed at no charge to all
third parties under the terms of this General Public License (except
that you may choose to grant warranty protection to some or all
third parties, at your option).
c) If the modified program normally reads commands interactively when
run, you must cause it, when started running for such interactive use
in the simplest and most usual way, to print or display an
announcement including an appropriate copyright notice and a notice
that there is no warranty (or else, saying that you provide a
warranty) and that users may redistribute the program under these
conditions, and telling the user how to view a copy of this General
Public License.
d) You may charge a fee for the physical act of transferring a
copy, and you may at your option offer warranty protection in
exchange for a fee.
Mere aggregation of another independent work with the Program (or its
derivative) on a volume of a storage or distribution medium does not bring
the other work under the scope of these terms.
3. You may copy and distribute the Program (or a portion or derivative of
it, under Paragraph 2) in object code or executable form under the terms of
Paragraphs 1 and 2 above provided that you also do one of the following:
a) accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of
Paragraphs 1 and 2 above; or,
b) accompany it with a written offer, valid for at least three
years, to give any third party free (except for a nominal charge
for the cost of distribution) a complete machine-readable copy of the
corresponding source code, to be distributed under the terms of
Paragraphs 1 and 2 above; or,
c) accompany it with the information you received as to where the
corresponding source code may be obtained. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form alone.)
Source code for a work means the preferred form of the work for making
modifications to it. For an executable file, complete source code means
all the source code for all modules it contains; but, as a special
exception, it need not include source code for modules which are standard
libraries that accompany the operating system on which the executable
file runs, or for standard header files or definitions files that
accompany that operating system.
4. You may not copy, modify, sublicense, distribute or transfer the
Program except as expressly provided under this General Public License.
Any attempt otherwise to copy, modify, sublicense, distribute or transfer
the Program is void, and will automatically terminate your rights to use
the Program under this License. However, parties who have received
copies, or rights to use copies, from you under this General Public
License will not have their licenses terminated so long as such parties
remain in full compliance.
5. By copying, distributing or modifying the Program (or any work based
on the Program) you indicate your acceptance of this license to do so,
and all its terms and conditions.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the original
licensor to copy, distribute or modify the Program subject to these
terms and conditions. You may not impose any further restrictions on the
recipients' exercise of the rights granted herein.
7. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of the license which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
the license, you may choose any version ever published by the Free Software
Foundation.
8. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS

View file

@ -0,0 +1,664 @@
[![Build Status](https://github.com/cpan-authors/XML-Parser/actions/workflows/testsuite.yml/badge.svg)](https://github.com/cpan-authors/XML-Parser/actions/workflows/testsuite.yml) [![Coverage](https://codecov.io/gh/cpan-authors/XML-Parser/graph/badge.svg)](https://codecov.io/gh/cpan-authors/XML-Parser)
# NAME
XML::Parser - A perl module for parsing XML documents
# SYNOPSIS
use XML::Parser;
$p1 = XML::Parser->new(Style => 'Debug');
$p1->parsefile('REC-xml-19980210.xml');
$p1->parse('<foo id="me">Hello World</foo>');
# Alternative
$p2 = XML::Parser->new(Handlers => {Start => \&handle_start,
End => \&handle_end,
Char => \&handle_char});
$p2->parse($socket);
# Another alternative
$p3 = XML::Parser->new(ErrorContext => 2);
$p3->setHandlers(Char => \&text,
Default => \&other);
open(my $fh, 'xmlgenerator |');
$p3->parse($fh, ProtocolEncoding => 'ISO-8859-1');
close($fh);
$p3->parsefile('junk.xml', ErrorContext => 3);
# DESCRIPTION
This module provides ways to parse XML documents. It is built on top of
[XML::Parser::Expat](https://metacpan.org/pod/XML%3A%3AParser%3A%3AExpat), which is a lower level interface to James Clark's
expat library. Each call to one of the parsing methods creates a new
instance of XML::Parser::Expat which is then used to parse the document.
Expat options may be provided when the XML::Parser object is created.
These options are then passed on to the Expat object on each parse call.
They can also be given as extra arguments to the parse methods, in which
case they override options given at XML::Parser creation time.
The behavior of the parser is controlled either by `["STYLES"](#styles)` and/or
`["HANDLERS"](#handlers)` options, or by ["setHandlers"](#sethandlers) method. These all provide
mechanisms for XML::Parser to set the handlers needed by XML::Parser::Expat.
If neither `Style` nor `Handlers` are specified, then parsing just
checks the document for being well-formed.
When underlying handlers get called, they receive as their first parameter
the _Expat_ object, not the Parser object.
# METHODS
- new
This is a class method, the constructor for XML::Parser. Options are passed
as keyword value pairs. Recognized options are:
- Style
This option provides an easy way to create a given style of parser. The
built in styles are: ["Debug"](#debug), ["Subs"](#subs), ["Tree"](#tree), ["Objects"](#objects),
and ["Stream"](#stream). These are all defined in separate packages under
`XML::Parser::Style::*`, and you can find further documentation for
each style both below, and in those packages.
Custom styles can be provided by giving a full package name containing
at least one '::'. This package should then have subs defined for each
handler it wishes to have installed. See ["STYLES"](#styles) below
for a discussion of each built in style.
- Handlers
When provided, this option should be an anonymous hash containing as
keys the type of handler and as values a sub reference to handle that
type of event. All the handlers get passed as their 1st parameter the
instance of expat that is parsing the document. Further details on
handlers can be found in ["HANDLERS"](#handlers). Any handler set here
overrides the corresponding handler set with the Style option.
- Pkg
Some styles will refer to subs defined in this package. If not provided,
it defaults to the package which called the constructor.
- ErrorContext
This is an Expat option. When this option is defined, errors are reported
in context. The value should be the number of lines to show on either side
of the line in which the error occurred.
- ProtocolEncoding
This is an Expat option. This sets the protocol encoding name. It defaults
to none. The built-in encodings are: `UTF-8`, `ISO-8859-1`, `UTF-16`, and
`US-ASCII`. Other encodings may be used if they have encoding maps in one
of the directories in the @Encoding\_Path list. Check ["ENCODINGS"](#encodings) for
more information on encoding maps. Setting the protocol encoding overrides
any encoding in the XML declaration.
- Namespaces
This is an Expat option. If this is set to a true value, then namespace
processing is done during the parse. See ["Namespaces" in XML::Parser::Expat](https://metacpan.org/pod/XML%3A%3AParser%3A%3AExpat#Namespaces)
for further discussion of namespace processing.
- NoExpand
This is an Expat option. Normally, the parser will try to expand references
to entities defined in the internal subset. If this option is set to a true
value, and a default handler is also set, then the default handler will be
called when an entity reference is seen in text. This has no effect if a
default handler has not been registered, and it has no effect on the expansion
of entity references inside attribute values.
- Stream\_Delimiter
This is an Expat option. It takes a string value. When this string is found
alone on a line while parsing from a stream, then the parse is ended as if it
saw an end of file. The intended use is with a stream of xml documents in a
MIME multipart format. The string should not contain a trailing newline.
- ParseParamEnt
This is an Expat option. Unless standalone is set to "yes" in the XML
declaration, setting this to a true value allows the external DTD to be read,
and parameter entities to be parsed and expanded.
**Implicit vs explicit parameter entity parsing:** When `ParseParamEnt` is
not set, parameter entity references (e.g. `%foo;`) in the internal DTD
subset are passed through to the **Default** handler as literal text. This is
the mode that XML::Twig and other DTD round-tripping tools rely on.
When `ParseParamEnt` is set to a true value, or when a declaration handler
(**Entity**, **Element**, or **Attlist**) is registered, parameter entity parsing
is activated. In this mode, PE references are resolved by expat (via the
**ExternEnt** handler) and subsequent declarations are routed to their
dedicated declaration handlers instead of the Default handler.
- NoLWP
This option has no effect if the ExternEnt or ExternEntFin handlers are
directly set. Otherwise, if true, it forces the use of a file based external
entity handler.
- BillionLaughsAttackProtectionMaximumAmplification
Sets the maximum amplification factor for the Billion Laughs attack
protection. See ["SECURITY"](#security) below for details.
This is an Expat option.
Requires libexpat >= 2.4.0 built with `XML_DTD` or `XML_GE`.
- BillionLaughsAttackProtectionActivationThreshold
Sets the activation threshold (in bytes) for the Billion Laughs attack
protection. See ["SECURITY"](#security) below for details.
This is an Expat option.
Requires libexpat >= 2.4.0 built with `XML_DTD` or `XML_GE`.
- AllocTrackerMaximumAmplification
Sets the maximum amplification factor for the allocation tracker.
See ["SECURITY"](#security) below for details.
This is an Expat option.
Requires libexpat >= 2.7.2 built with `XML_DTD` or `XML_GE`.
- AllocTrackerActivationThreshold
Sets the activation threshold (in bytes) for the allocation tracker.
See ["SECURITY"](#security) below for details.
This is an Expat option.
Requires libexpat >= 2.7.2 built with `XML_DTD` or `XML_GE`.
- ReparseDeferralEnabled
Enables or disables reparse deferral, a security mechanism that prevents
certain token-boundary attacks. See ["SECURITY"](#security) below for details.
This is an Expat option.
Requires libexpat >= 2.6.0.
- Non\_Expat\_Options
If provided, this should be an anonymous hash whose keys are options that
shouldn't be passed to Expat. This should only be of concern to those
subclassing XML::Parser.
- setHandlers(TYPE, HANDLER \[, TYPE, HANDLER \[...\]\])
This method registers handlers for various parser events. It overrides any
previous handlers registered through the Style or Handler options or through
earlier calls to setHandlers. By providing a false or undefined value as
the handler, the existing handler can be unset.
This method returns a list of type, handler pairs corresponding to the
input. The handlers returned are the ones that were in effect prior to
the call.
See a description of the handler types in ["HANDLERS"](#handlers).
- parse(SOURCE \[, OPT => OPT\_VALUE \[...\]\])
The SOURCE parameter should either be a string containing the whole XML
document, or it should be an open IO::Handle. Constructor options to
XML::Parser::Expat given as keyword-value pairs may follow the SOURCE
parameter. These override, for this call, any options or attributes passed
through from the XML::Parser instance.
A die call is thrown if a parse error occurs. Otherwise it will return 1
or whatever is returned from the **Final** handler, if one is installed.
In other words, what parse may return depends on the style.
See ["ERROR HANDLING"](#error-handling) below for how to catch and handle parse errors.
- parsestring
This is just an alias for parse for backwards compatibility.
- parsefile(FILE \[, OPT => OPT\_VALUE \[...\]\])
Open FILE for reading, then call parse with the open handle. The file
is closed no matter how parse returns. A die call is thrown if the file
cannot be opened or if a parse error occurs. Returns what parse returns.
- parse\_start(\[ OPT => OPT\_VALUE \[...\]\])
Create and return a new instance of XML::Parser::ExpatNB. Constructor
options may be provided. If an init handler has been provided, it is
called before returning the ExpatNB object. Documents are parsed by
making incremental calls to the parse\_more method of this object, which
takes a string. A single call to the parse\_done method of this object,
which takes no arguments, indicates that the document is finished.
If there is a final handler installed, it is executed by the parse\_done
method before returning and the parse\_done method returns whatever is
returned by the final handler.
# HANDLERS
Expat is an event based parser. As the parser recognizes parts of the
document (say the start or end tag for an XML element), then any handlers
registered for that type of an event are called with suitable parameters.
All handlers receive an instance of XML::Parser::Expat as their first
argument. See ["METHODS" in XML::Parser::Expat](https://metacpan.org/pod/XML%3A%3AParser%3A%3AExpat#METHODS) for a discussion of the
methods that can be called on this object.
## Init (Expat)
This is called just before the parsing of the document starts.
## Final (Expat)
This is called just after parsing has finished, but only if no errors
occurred during the parse. Parse returns what this returns.
## Start (Expat, Element \[, Attr, Val \[,...\]\])
This event is generated when an XML start tag is recognized. Element is the
name of the XML element type that is opened with the start tag. The Attr &
Val pairs are generated for each attribute in the start tag.
## End (Expat, Element)
This event is generated when an XML end tag is recognized. Note that
an XML empty tag (&lt;foo/>) generates both a start and an end event.
## Char (Expat, String)
This event is generated when non-markup is recognized. The non-markup
sequence of characters is in String. A single non-markup sequence of
characters may generate multiple calls to this handler. Whatever the
encoding of the string in the original document, this is given to the
handler in UTF-8.
**Important:** Because the underlying expat library parses in fixed-size
chunks, character data that spans a buffer boundary will arrive as two or
more consecutive Char events. This typically occurs with files larger than
about 32 KiB and is not a bug. To obtain the complete text of an element,
accumulate the strings delivered between Start and End events:
my $current_text;
sub start_handler { $current_text = ''; }
sub char_handler { $current_text .= $_[1]; }
sub end_handler { print "complete text: $current_text\n"; }
The Stream style (`XML::Parser::Style::Stream`) already performs this
accumulation automatically.
## Proc (Expat, Target, Data)
This event is generated when a processing instruction is recognized.
## Comment (Expat, Data)
This event is generated when a comment is recognized.
## CdataStart (Expat)
This is called at the start of a CDATA section.
## CdataEnd (Expat)
This is called at the end of a CDATA section.
## Default (Expat, String)
This is called for any characters that don't have a registered handler.
This includes both characters that are part of markup for which no
events are generated (markup declarations) and characters that
could generate events, but for which no handler has been registered.
Whatever the encoding in the original document, the string is returned to
the handler in UTF-8.
## Unparsed (Expat, Entity, Base, Sysid, Pubid, Notation)
This is called for a declaration of an unparsed entity. Entity is the name
of the entity. Base is the base to be used for resolving a relative URI.
Sysid is the system id. Pubid is the public id. Notation is the notation
name. Base and Pubid may be undefined.
## Notation (Expat, Notation, Base, Sysid, Pubid)
This is called for a declaration of notation. Notation is the notation name.
Base is the base to be used for resolving a relative URI. Sysid is the system
id. Pubid is the public id. Base, Sysid, and Pubid may all be undefined.
## ExternEnt (Expat, Base, Sysid, Pubid)
This is called when an external entity is referenced. Base is the base to be
used for resolving a relative URI. Sysid is the system id. Pubid is the public
id. Base, and Pubid may be undefined.
This handler should either return a string, which represents the contents of
the external entity, or return an open filehandle that can be read to obtain
the contents of the external entity, or return undef, which indicates the
external entity couldn't be found and will generate a parse error.
If an open filehandle is returned, it must be returned as either a glob
(\*FOO) or as a reference to a glob (e.g. an instance of IO::Handle).
A default handler is installed for this event. The default handler is
XML::Parser::lwp\_ext\_ent\_handler unless the NoLWP option was provided with
a true value, otherwise XML::Parser::file\_ext\_ent\_handler is the default
handler for external entities. Even without the NoLWP option, if the
URI or LWP modules are missing, the file based handler ends up being used
after giving a warning on the first external entity reference.
The LWP external entity handler will use proxies defined in the environment
(http\_proxy, ftp\_proxy, etc.).
Please note that the LWP external entity handler reads the entire
entity into a string and returns it, where as the file handler opens a
filehandle.
Also note that the file external entity handler will likely choke on
absolute URIs or file names that don't fit the conventions of the local
operating system.
The expat base method can be used to set a basename for
relative pathnames. If no basename is given, or if the basename is itself
a relative name, then it is relative to the current working directory.
## ExternEntFin (Expat)
This is called after parsing an external entity. It's not called unless
an ExternEnt handler is also set. There is a default handler installed
that pairs with the default ExternEnt handler.
If you're going to install your own ExternEnt handler, then you should
set (or unset) this handler too.
## Entity (Expat, Name, Val, Sysid, Pubid, Ndata, IsParam)
This is called when an entity is declared. For internal entities, the Val
parameter will contain the value and the remaining three parameters will be
undefined. For external entities, the Val parameter will be undefined, the
Sysid parameter will have the system id, the Pubid parameter will have the
public id if it was provided (it will be undefined otherwise), the Ndata
parameter will contain the notation for unparsed entities. If this is a
parameter entity declaration, then the IsParam parameter is true.
Note that this handler and the Unparsed handler above overlap. If both are
set, then this handler will not be called for unparsed entities.
## Element (Expat, Name, Model)
The element handler is called when an element declaration is found. Name
is the element name, and Model is the content model as an XML::Parser::Content
object. See ["XML::Parser::ContentModel Methods" in XML::Parser::Expat](https://metacpan.org/pod/XML%3A%3AParser%3A%3AExpat#XML::Parser::ContentModel-Methods)
for methods available for this class.
## Attlist (Expat, Elname, Attname, Type, Default, Fixed)
This handler is called for each attribute in an ATTLIST declaration.
So an ATTLIST declaration that has multiple attributes will generate multiple
calls to this handler. The Elname parameter is the name of the element with
which the attribute is being associated. The Attname parameter is the name
of the attribute. Type is the attribute type, given as a string. Default is
the default value, which will either be "#REQUIRED", "#IMPLIED" or a quoted
string (i.e. the returned string will begin and end with a quote character).
If Fixed is true, then this is a fixed attribute.
## Doctype (Expat, Name, Sysid, Pubid, Internal)
This handler is called for DOCTYPE declarations. Name is the document type
name. Sysid is the system id of the document type, if it was provided,
otherwise it's undefined. Pubid is the public id of the document type,
which will be undefined if no public id was given. Internal will be
true or false, indicating whether or not the doctype declaration contains
an internal subset.
## \* DoctypeFin (Expat)
This handler is called after parsing of the DOCTYPE declaration has finished,
including any internal or external DTD declarations.
## XMLDecl (Expat, Version, Encoding, Standalone)
This handler is called for xml declarations. Version is a string containing
the version. Encoding is either undefined or contains an encoding string.
Standalone will be either true, false, or undefined if the standalone attribute
is yes, no, or not made respectively.
# STYLES
## Debug
This just prints out the document in outline form. Nothing special is
returned by parse.
## Subs
Each time an element starts, a sub by that name in the package specified
by the Pkg option is called with the same parameters that the Start
handler gets called with.
Each time an element ends, a sub with that name appended with an underscore
("\_"), is called with the same parameters that the End handler gets called
with.
Nothing special is returned by parse.
## Tree
Parse will return a parse tree for the document. Each node in the tree
takes the form of a tag, content pair. Text nodes are represented with
a pseudo-tag of "0" and the string that is their content. For elements,
the content is an array reference. The first item in the array is a
(possibly empty) hash reference containing attributes. The remainder of
the array is a sequence of tag-content pairs representing the content
of the element.
So for example the result of parsing:
<foo><head id="a">Hello <em>there</em></head><bar>Howdy<ref/></bar>do</foo>
would be:
Tag Content
==================================================================
[foo, [{}, head, [{id => "a"}, 0, "Hello ", em, [{}, 0, "there"]],
bar, [ {}, 0, "Howdy", ref, [{}]],
0, "do"
]
]
The root document "foo", has 3 children: a "head" element, a "bar"
element and the text "do". After the empty attribute hash, these are
represented in it's contents by 3 tag-content pairs.
## Objects
This is similar to the Tree style, except that a hash object is created for
each element. The corresponding object will be in the class whose name
is created by appending "::" and the element name to the package set with
the Pkg option. Non-markup text will be in the ::Characters class. The
contents of the corresponding object will be in an anonymous array that
is the value of the Kids property for that object.
## Stream
This style also uses the Pkg package. If none of the subs that this
style looks for is there, then the effect of parsing with this style is
to print a canonical copy of the document without comments or declarations.
All the subs receive as their 1st parameter the Expat instance for the
document they're parsing.
It looks for the following routines:
- StartDocument
Called at the start of the parse .
- StartTag
Called for every start tag with a second parameter of the element type. The $\_
variable will contain a copy of the tag and the %\_ variable will contain
attribute values supplied for that element.
- EndTag
Called for every end tag with a second parameter of the element type. The $\_
variable will contain a copy of the end tag.
- Text
Called just before start or end tags with accumulated non-markup text in
the $\_ variable.
- PI
Called for processing instructions. The $\_ variable will contain a copy of
the PI and the target and data are sent as 2nd and 3rd parameters
respectively.
- EndDocument
Called at conclusion of the parse.
# ENCODINGS
XML documents may be encoded in character sets other than Unicode as
long as they may be mapped into the Unicode character set. Expat has
further restrictions on encodings. Read the xmlparse.h header file in
the expat distribution to see details on these restrictions.
Expat has built-in encodings for: `UTF-8`, `ISO-8859-1`, `UTF-16`, and
`US-ASCII`. Encodings are set either through the XML declaration
encoding attribute or through the ProtocolEncoding option to XML::Parser
or XML::Parser::Expat.
For encodings other than the built-ins, expat calls the function
load\_encoding in the Expat package with the encoding name. This function
looks for a file in the path list @XML::Parser::Expat::Encoding\_Path, that
matches the lower-cased name with a '.enc' extension. The first one it
finds, it loads.
If you wish to build your own encoding maps, check out the XML::Encoding
module from CPAN.
# ERROR HANDLING
XML::Parser throws an exception (dies) when it encounters a parse error.
This includes malformed XML, encoding errors, and other problems detected
by the underlying expat library.
The `parse`, `parsefile`, and `parse_done` methods may all throw
exceptions. To handle parse errors gracefully in your application, wrap
the parse call in an `eval` block:
my $parser = XML::Parser->new(Style => 'Tree');
my $tree = eval { $parser->parsefile('data.xml') };
if ($@) {
# Handle the parse error
warn "Parse failed: $@";
}
The error message (in `$@`) will include the line number, column number,
and byte position where the error was detected. For additional context
around the error location, set the **ErrorContext** option when constructing
the parser:
my $parser = XML::Parser->new(
Style => 'Tree',
ErrorContext => 2,
);
This will include 2 lines of context on either side of the error in the
error message.
# SECURITY
XML::Parser relies on the expat C library for parsing. Modern versions of
expat include several security mechanisms that can be tuned through
constructor options passed to `new()`. These options are forwarded directly
to [XML::Parser::Expat](https://metacpan.org/pod/XML%3A%3AParser%3A%3AExpat) and take effect for every subsequent `parse`,
`parsefile`, or `parse_start` call on the parser instance.
All of these options will `croak` at runtime if the underlying libexpat does
not support them.
## Billion Laughs Attack Protection
The Billion Laughs attack (also known as an XML bomb) uses deeply nested
entity definitions to cause exponential expansion, consuming memory and CPU.
Expat >= 2.4.0 (built with `XML_DTD` or `XML_GE`) includes built-in
protection controlled by two parameters:
- **BillionLaughsAttackProtectionMaximumAmplification**
The maximum ratio between the size of the expanded output and the size of
the input. For example, a value of `100.0` means the parser will abort if
entity expansion would produce output more than 100 times the size of the
input.
- **BillionLaughsAttackProtectionActivationThreshold**
The number of bytes of expanded output before the amplification limit takes
effect. This prevents false positives on small documents that happen to
have a high amplification ratio.
## Allocation Tracker
Expat >= 2.7.2 (built with `XML_DTD` or `XML_GE`) adds a second layer
of amplification tracking through the allocation tracker, which measures
memory allocation rather than output size:
- **AllocTrackerMaximumAmplification**
The maximum ratio of allocated memory to input size.
- **AllocTrackerActivationThreshold**
The number of bytes of allocation before the limit takes effect.
## Reparse Deferral
Expat >= 2.6.0 includes reparse deferral, which prevents attacks that
exploit token boundaries. Rather than reparsing incomplete tokens
immediately, the parser defers until more input arrives.
- **ReparseDeferralEnabled**
A boolean. Set to a true value to enable reparse deferral, or `0` to
disable it.
For full details on each option, see ["new" in XML::Parser::Expat](https://metacpan.org/pod/XML%3A%3AParser%3A%3AExpat#new).
# Example: tighten Billion Laughs limits
my $parser = XML::Parser->new(
Style => 'Tree',
BillionLaughsAttackProtectionMaximumAmplification => 50,
BillionLaughsAttackProtectionActivationThreshold => 1024,
);
# LICENSE
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
See [https://dev.perl.org/licenses/](https://dev.perl.org/licenses/) for more information.
# AUTHORS
Larry Wall <`larry@wall.org`> wrote version 1.0.
Clark Cooper <`coopercc@netheaven.com`> picked up support, changed the API
for this version (2.x), provided documentation,
and added some standard package features.
Matt Sergeant <`matt@sergeant.org`> was maintaining XML::Parser from 2003 to 2007.
Alexandr Ciornii <`alexchorny@gmail.com`> was maintaining XML::Parser from 2007 to 2013.
Todd Rinaldo <`toddr@cpan.org`> has been maintaining XML::Parser since 2013.
The project started making use of Claude Code <`https://claude.ai/code`> in January 2026.