PDF Parser

PHP library to parse PDF files and extract elements like text.

Features

  • Load and parse objects and headers
  • Extract metadata (author, description, keywords, ...)
  • Extract text from ordered pages
  • Support for compressed pdf (and not)
  • Support of charset encoding (WinAnsi, MacRoman)
  • Handling of hexa and octal content encoding
  • PSR-0 compliant (autoloader)
  • Compatible with Composer
  • PSR-1 compliant (code styling)

Todo list

  • Complete unit tests

News

  • The 0.9.10 release now supports metadata extract
  • Documentation has been updated to match v0.9.5
  • A demo section is now available
  • Support for hexa and octal encoding in properties
  • Support of mixed charset encoded contents
  • Under active development, any help will be appreciated