Data Profiler
Analyze Data for Quality, Compliance
Pervasive's Data Profiler is a data analysis and testing tool targeted to data governance (compliance), data quality testing, and data integration projects. It features a GUI within which users can define data connection sources, define the profiling metrics (how the data will be tested/analyzed), define the output locations, and run the analysis.
The product is able to examine over 150 data source types--including databases and flat files--and generates both statistical and "pass/fail" reports on the data examined. The vendor notes as a key feature the product's "DataRush" analysis engine; which is built to leverage multi-core processors while performing parallel data processing. The technology is able to process the entire data stack; though the user may also opt to test a sample data set.
Basic usage of the product is as follows:
- Create a container "Project" that will contain all the elements (definitions, connections, etc.) that will be needed to execute a profiling run.
- Define metrics to be applied to source fields and derived data fields; i.e., what needs to be measured within the data. Such metrics can perform tasks such as returning boolean answers, performing data type conversions, creating derived fields, or summarizing data. Among the out-of-the-box standard metrics included with the product are support for testing for blank or null fields, comparing to constants or another field, converting to double or dates, averages, min/max, sum, and more. User-defined metrics are also supported.
- Generate reports detailing both pass and fail data (with optional "Clean" and "Dirty" files including pass/fail data) as well as view statistical data on the examined data such as percentage of success rate and counts.
Other features of the platform include support for 3-D color pie and bar chart outputs; drill down capabilities on data within charts; support for PDF, HTML, and CSV output data; support for automated data testing; and an "AutoGen" feature allowing for one-step generation of standard metrics.
New features of the latest Pervasive Data Profiler release not already mentioned above include:
- An Eclipse Rich Client Platform interface, with improved error handling and reporting, and enhanced debugging and validation capabilities for created scripts
- Additional out-of-the-box metrics and business rules
- Support for the re-using and sharing of profile rules amongst users
- The ability to join multiple heterogeneous files in a single run
Data Profiler is available now. Contact Pervasive Software for further information.
product submission by DatabaseJournal Staff
E-Mail this page to a colleague
send info about Data Profiler

Suggest a link
for the Data Profiler fact sheet