BI & Analytics Blog: Data Virtualization

Showing posts with label Data Virtualization. Show all posts

Tuesday, May 20, 2014

Denodo Architecture - Data Source Layer

Denodo has a wide range of point and click interfaces covering structured, semi-structured and unstructured data sources. This post will try to list samples in each of these categories to paint the possibilities with Denodo.

Structured Data:

RDBMS: Oracle, IBM DB2, Microsoft SQL Server, Sybase, MySQL, Postgre SQL etc
Data Warehouse appliances and Parallel Databases: Netezza, Teradata, Oracle Exadata, ParAccel, Sybase IQ
Multidimensional OLAP Engines: SAP BW, MS SSAS, Essbase & Mondrain
Enterprise Applications : Salesforce, Siebel, Peoplesoft, Oracle E-Business suite, SAP R3 / ECC
Mainframe / Legacy: Adabas, IMS, DB2, TN5250 / TN3270. Plug-in architecture available for use of third party mainframe / legacy adapters.

Semi-Structured Data:

SOAP / REST Web Services & data feeds: XML, RSS, JSON, ATOM and CSV formats.
Directory Services: Can connect and introspect LDAP and Active Directory services as source data

Un-Structured Data:

Email, MS Word etc

Other data sources supported and which I am yet to categorize into above categories are as below as it depends on the data that is stored within those data source:

Semantic repositories in Triple Stores / RDF accessed through SPARQL endpoints.
Big Data & NoSQL databases: Hadoop, Hive, HBase, Mongo DB, CouchDB, Neo4J, MarkLogic.
Cloud & SaaS sources:

Via APIs: Salesforce, Amazon, Google LinkedIn, Facebook, Twitter
Via Browser Automation: Website, Form, WebApp

Custom Applications: Connector SDK available to access any custom application through API and procedural interfaces.
Sophisticated tools to expose Web, Semi & Unstructured data as virtual relational data/service.

Tuesday, May 13, 2014

Data Virtualization Tool Hunt!!! - Landed@Denodo

Our hunt for a data virtualization tool became more interesting with Denodo. Denodo caught my eye when I first read an old (as compared to the date I am writing this blog) research firm report describing the evolution and status then. It was quite promising and the road map continued to be robust till date. What’s next? Yes, you guessed it right and we chose to evaluate Denodo for our needs.

The journey was rich in learning hence attempting to blog it to share it with larger audience. These are purely my personal experiences and give you only a glimpse of the possibilities with this tool. And no doubt there are other tools available in the market and I am yet to evaluate them.

To start with, let’s have a quick overview of “Denodo”.

What’s Denodo?

Denodo offers a Data Integration and Data Virtualization software platform called “Denodo Platform”. It is primarily a middleware software platform that enables virtual/real integration of data across diverse number and type of sources. This can also be considered as a platform that enables “Information-as-a-service” model within the enterprise.

Few Suitable Use Cases:

Virtual Data Integration & Build a Data Federation Layer
Virtual MDM
Data Provisioning needs through an SOA model

Denodo Platform’s Architecture: It can be logically categorized into three layers:

Data Source / Connect Layer
Data Virtualization / Combine Layer
Data Consumption / Publish Layer

It can be visualized using a simple example shown below:

Will try to detail out each layer in more detail in the following posts. Looking forward to hear your experiences in this area.

Sunday, May 11, 2014

Data Virtualization Vs. Data Federation

Off late, I was posed with a question by my colleague during a brain-storming session:

Are Data Virtualization and Data Federation different?

Aren’t they synonymous?

This put me on a search if there are any good definitions around already for these that would not only verify my understanding, also enables me in future for easier demonstrations. Bang on! I could find one of the definitions by Rick van der Lans that I would like to agree with. Rick’s proposed definitions are:

Data Virtualization:

Data virtualization is the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.

Data Federation:

Data federation is a form of data virtualization where the data stored in a heterogeneous set of autonomous data stores is made accessible to data consumers as one integrated data store by using on-demand data integration.

Though the terms data virtualization and data federation are still used interchangeably in current world and standards definitions are yet to be accepted widely.

Recently we were looking for a tool that can act as a work horse for virtual data warehouse implementation. Sounds simple but needs a deep dive before taking this architectural decision. Few scenarios in which we chose Virtual Data Warehouse as one of the viable solution that could ensure successful delivery and add value are:

As a strategic solution to showcase the benefits of data integration and Data Mart/Data warehouse implementation
As a mid-term solution where the data volumes and complexity is not a concern
As a strategic solution to showcase the registry model implementation of a master data management solution

There are few other scenarios which are identified for evaluation and progress so far looks promising. Will continue writing on this post with the tools we considered and so on.

Looking forward to hear back your experiences in this area.