During a teleconference meeting, a developer at DoorDash was telling the company’s director of business intelligence, Marta Vovchenko, about a data model he was putting together. But he didn’t share his screen to show, say, a visualization in DataGrip or a diagram in erwin Data Modeler.
He just held up a piece of paper to the camera.
“Any data modeling starts with pen and paper,” Vovchenko told Built In. “If you’re talking about data modeling tools, pen and paper is a quintessential tool.”
Data Modeling Tools to Know
- SQuirreL SQL Client
- MySQL Workbench
- erwin Data Modeler
Of course, tools come into play at the next step of the process, but the low-tech approach speaks to the core essence of data modeling — plotting and illustrating the relationship between various entities, whether that’s inside an application database or within a data warehouse architecture.
The question of which data modeling tools to consider is contextual. “It’s a question of the right tools for the right situation,” Vovchenko said. Key considerations revolve around an organization’s tech stack and data security. “You have to keep in mind that, if you’re connecting different solutions to your databases, that means you might be opening up your databases,” she added.
Also, there can be some pretty stark tech-cultural lines. If you’re on a data team at a startup with no dedicated data architect role, there’s a good chance you’ve never practically encountered, for instance, ER/Studio — an enterprise-friendly, Windows-only tool with a strong foothold in more traditional corporate realms.
What Is Data Modeling?
Data modeling is the process of creating a visual representation of a software system to illustrate the data it contains and how it flows. Data models will typically use text and symbols to represent how data is functioning within the software. Businesses use data modeling to understand how their data can be grouped and organized, as well as how it relates to larger business initiatives.
Data models can also be described as living documents intended to shift and change along with the needs of the business. For example, if a data model is used to help create a new database, the data model lives on after the database has been deployed to explain why the database exists and how it functions. Data models are important because businesses can use them to assist with software development as well as for analytics and business intelligence. Another benefit of data modeling is that it can often be a collaborative process between teams and stakeholders and can be used as a source of truth for a software system or database, allowing people within an organization to have an accessible resource to answer their queries.
Top Data Modeling Tools to Know
Think of Draw.io as the bridge between pen and paper or whiteboarding to something a little more formalized. It’s a free, intuitive, browser-based flowchart builder in which users can drag and drop entity shapes (including the ellipses and parallelograms common to data models) onto a canvas, then join them with connector lines. It’s a handy way to get something down quickly, even for someone lacking access to high-powered software.
It might not make sense in contexts that deal with a lot of unstructured, schema-on-read data, like event-based data sets, but it’s smooth for things like finance models, which are of course highly structured, Vovchenko said.
“The very first model that was drawn for finance [at DoorDash] was actually drawn in Draw.io,” she said.
An SQL plugin, which automatically builds entity shapes and attributes from SQL code, is also available.
RelatedDesigning a Database: What You Need to Know
Lucidchart is a diagram builder similar to Draw.io, but it promises more complex flows and stronger data protection. Unlike Draw.io, Lucidchart’s unlimited plans carry a cost. Lucidchart links to SQL, which means users can pipe their canvases into their actual database management system.
It’s part of DoorDash’s toolkit too. “Historically, we’ve had a lot of microservices based on PostgresQL databases,” Vovchenko said. “Using Lucidcharts was helpful because you can connect it directly to those databases, then read the schema and work with the already pre-documented diagram.”
SQuirreL SQL Client
There are three overarching considerations when working with data models, according to Vovchenko: agility, consistency and data integrity. Agility effectively translates to a willingness to accept the fact that no single data-modeling tool is going to suffice across all use cases.
DoorDash has utilized SQuirreL, a free, open-source graphical tool that’s supported by most major relational databases. Described by Vovchenko as easy to install and deploy, it lets users see database structures and draw entity-relationship diagrams from existing schemas, among other features.
Open-source options may be particularly attractive to net-new businesses, since, as Ajay Gupta noted, connecting those applications to older data environments can sometimes require the added step of front-ending an open database connectivity (ODBC) API.
RelatedHow to Develop a Clear Data Science Strategy for Your Business
This free, open-source and cross-platform data-modeling GUI layer for MySQL databases has a reputation for intuitiveness and ease of use. Vovchenko said she previously used MySQL Workbench for structured data alongside the MicroStrategy analytics platform.
“Having the ability to use Workbench to develop a model, then overlay it with this extensive logical model to use for reporting purposes, was very helpful for us,” she said. In fact, her team formerly had a regularly scheduled exercise where it would re-create the full model with Workbench and then examine how it fit into the environment. “That’s a really good tool,” she added.
Perhaps the most important trend in data in recent years is the rise of data catalogs, spurred in no small part by privacy regulations like GDPR and CCPA. That push has touched data modeling, too. Vovchenko said the line between data lineage tools and data modeling tools will only continue to blur.
Exemplifying the trend is Amundsen, the metadata discovery platform developed and made available by Lyft.
“Amundsen models metadata entities as a graph, which makes it easy to extend the model when more entities are introduced,” wrote Lyft engineer Tao Fang when the company made the open-source announcement in 2019. “All of the entities — such as tables, columns and schemas — are connected via edges that represent the relationships between them.”
erwin Data Modeler
Erwin runs on MySQL and PostgresQL — the kinds of free databases toward which so much of the tech world has gravitated in recent years. But it also supports the likes of Teradata, Oracle and Netezza — which is why it’s perhaps more associated with enterprise legacy shops, where data models tend to be more thoroughly optimized than in some deliver-features-fast startups.
That said, organizations that accept the cost tradeoff get some very robust capabilities. Benefits include the ability to both forward and reverse engineer, generate DDL statements and “help you identify diffs, to keep the history of your schema evolution,” Vovchenko said.
Along with erwin, ER/Studio is another powerful, long-running, enterprise-friendly market leader in data modeling. It supports forward and reverse engineering, and also runs the gamut in terms of databases supported. The tool — which is Windows-only — is available in four options, with the top model supporting Jira integration, conceptual mapping and schema generation on database, among other features.
Both erwin and ER/Studio have weathered ownership changes since arriving in the late ’90s. Those shakeups may have factored into occasional stability issues for both products, Gupta said, but the fact that both can connect to so many different data sources and database platforms should counteract lingering fears of bugginess. Both are quality options for enterprises that want to avoid vendor lock-in, he added.
InfoSphere Data Architect
Developed by IBM, InfoSphere Data Architect is a collaborative enterprise data modeling and design solution ideal for business intelligence and data management. Some of the tool’s features include querying capabilities, the ability to import and export constant mappings and create logical and physical data models. InfoSphere Data Architect is compatible with IBM Db2, IBM Informix, Oracle, Sybase, Microsoft SQL Server, MySQL and Teradata source systems.
RelatedThe Line Between Data Lakes and Data Warehouses Is Blurring. Will It Disappear?
SQL Server Management Studio
For those looking for an accessible, Windows-compatible tool, SQL Server Management Studio might be a good option as it comes standard with Microsoft’s SQL Server. The tool allows users to configure, monitor and administer instances of SQL.
Data modelers can use SSMS to deploy, monitor and upgrade data-tier parts used by applications. SSMS can be used to manage databases as well as to design and query models on both a local computer and in the cloud.
Oracle SQL Developer Data Modeler
Created in 2009, Oracle’s SQL Developer Data Modeler is a free data modeling tool that allows users to create relational, physical and multidimensional models. Users can also apply the tool to forward and reverse engineer data. The data modeler has had over 20 iterations and releases and future enhancements plan to include compatibility for MySQL, TimesTen, Sybase, and SQL Server 8.
Tech culture can create pretty sharp dividing lines when it comes to who reaches for what data-modeling tools. Mang-Git Ng is the founder of Anvil, a startup focusing on paperwork automation, and a Flexport veteran. For him, enterprise tools like erwin and ER/Studio might as well exist on another planet.
“I didn’t really see a need for a more advanced data modeling tool, if you already understand the data model that you’ve built,” he said.
So what does someone like Ng use? Postico isn’t exactly a data modeling tool; it’s a GUI for PostgresQL that lets users see the tables and views in their database in an intuitive way. Ng said it’s a handy way to look under the hood of your database without getting into the command-line weeds.
Navicat is a comprehensive platform that offers a variety of solutions for database development. Navicat’s data modeler allows users to build conceptual, logical and physical data models. It works to visually design database structures, perform reverse and forward engineering processes, import models from ODBC data sources, and print models to files.
The tool also strives to simplify data modeling by generating the SQL script compatible with MySQL, MariaDB, Oracle, SQL Server, PostgreSQ and SQLite.
The open-source Metabase is a graphical interface tool with some helpful analytics visualizations rather than an advanced modeling tool. Its main selling point is cross-team data access. But it also lets users join tables and define entity relationships.
Ng said of his experience with the tool: “I essentially could make a copy of our database, hook it up to Metabase, then slice and dice the data however I wanted. I could make SQL queries on top of the data and do table joins, all from within the UI — which is pretty nice.”
Along with the intuitive UI, Ng said he was impressed with Metabase’s ease of deployment. He got it up and running at full speed within half a day.
Quest’s Toad Data Modeler tool is compatible with over 20 different databases and can help users deploy accurate changes to data structures. The data modeling tool allows users to construct logical and physical data models, compare and synchronize models, quickly generate complex SQL/DDL, create and modify database schema and scripts as well as reverse and forward engineer both databases and data warehouse systems.
The goal of Toad Data Modeler is to give data modelers, DBAs and developers access to a tool that requires minimal training and allows for fast installation and integration.
Short for Data Build Tool, dbt allows users to visualize data lineage and complete SQL-based data modeling. The tool can help users layer data models with the ref() function and supports models ranging from tables to incremental tables as well as views or a custom design.
DataGrip is popular among database administrators because it allows users to add, remove, edit and clone data rows as well as use text search to easily find information displayed. This data modeling tool also helps users write SQL code with features like context-sensitive, schema-aware code completion and auto-generating code for changing objects like tables and columns based on UI. Queries can be run with features like a query console with each console being able to support schema switching and provide a historical record of your activity and work.
Ab Initio is an enterprise data platform with a variety of capabilities. The platform’s goal is to provide automation and self-service data solutions that can easily adapt and solve issues that arise in data processing and data management.
The platform’s cataloging capabilities are one aspect data modelers can look at as it can mine data from data lakes, the cloud, an operational system or a data warehouse. Data modelers can use Ab Initio to identify datasets and curate the information in them, all while using automation. Users can then search the data and explore it to form models and reviews.
Modelling tools are basically 'model-based testing tools' which actually generates test inputs or test cases from stored information about a particular model (e.g. a state diagram), so are classified as test design tools. It helps to validate models of the system or software.
- The Conceptual Data Model. ...
- The Logical Data Model. ...
- The Physical Data Model. ...
- The Hierarchical Data Model. ...
- The Network Data Model. ...
- The Relational Data Model. ...
- The Entity-Relationship Data Model.
HeidiSQL. A free data modeling tool that offers sufficient features and capabilities for most organizations. It supports MySQL, Microsoft SQL, PostgreSQL and MariaDB.
The data modeling capabilities introduced to Tableau in 2020.2 are designed to make analysis over common multi-table data scenarios—including star and snowflake data models—easy. The following types of models are supported in Tableau data sources.
Data modeling is the process of diagramming data flows. When creating a new or alternate database structure, the designer starts with a diagram of how data will flow into and out of the database.