Microsoft Access can be a great tool for database management. It is widely available as part of the Microsoft Office Suite, and it is simple enough to be used for the type of smaller scale or ad hoc projects that are created by end users themselves. However, Microsoft Access is not the best solution for every database project- depending on the requirements, other software like Excel or SQL Server may be a better choice.
This article will outline the major considerations for database software selection for a small or medium sized database, using the experience we’ve gained at Marlinspike Consulting from working on many such projects. Should you choose to use Access for your database, Part 2 of this article will then walk through the design process you should follow, from the design of the tables and other database objects to the automation and ongoing maintenance of the database. This article is intended for project managers and database designers, but nearly anyone who is a stakeholder in a data project could benefit from it.
Note that database development does require some specialized knowledge, and seemingly small design choices made early in a project can have a large impact later. Marlinspike Consulting has helped numerous businesses design and implement Access database solutions. Call us if you would like us to help with yours.
Choosing between Access and Excel
Many businesses use Excel as a database. It has a simple user interface, good formatting capability and it supports basic data types and filtering. But Excel databases often expand and outgrow that program’s limitations, and are then migrated to Access. Following are several business requirements that can help determine whether to store your data in Excel or Access.
Use Access To Track More Than One Entity
Excel spreadsheets can make excellent databases when only a single entity needs to be tracked. In Excel, data entry is intuitive: each record gets its own row and each field its own column, and navigation between records and fields is straightforward. But unlike Access, Excel does not have built-in support for tracking more than one entity using multiple tables. So while Excel would be a fine tool to manage a simple customer list containing names, addresses and phone numbers, it has no easy way to associate those customers with another entity, like the orders placed by each of them.
A customer database in Excel would store each customer in a separate row in the spreadsheet, and each piece of information about that customer, like Name, Street, City, State, Zip, and Phone Number, in a different column within that row. This type of data storage that tracks only a single set of rows and columns is called a flat file, and Excel can do a great job with this type of data as long as there is not an additional entity that also needs to be tracked.
However, if you wish to use Excel to track that list of customers as well as each order that those customers have placed, then you would need a second, similar spreadsheet to track the orders. In Excel this sheet might be placed in a separate worksheet, and would track information like Order Date, Ship Date, and Ship Method, as well as the fields indicating which items were ordered. However, unlike Access, Excel has no straightforward way to associate each Order with the row for the Customer who placed that order.
Enforcing a relationship between related records from two different tables like Customers and Orders is called Referential Integrity, and Relational Database Management Systems like Access support this as part of their core functionality. By having a field (or fields) such as CustomerID that are shared in common between the two tables, Access can easily associate each order with the customer who placed that order, so reports could show a list of orders grouped together by customer. This also allows updates or deletions of one of the records to be handled in a way that won’t negatively impact any related records, by preventing the deletion of a customer who has outstanding orders, for example.
This need to track additional entities beyond the first one is what differentiates a flat file from a relational database, and it is one of the most common reasons to choose Access over Excel.
Use Access To Track A Large Number Of Rows
For a large enough file, Access might still be a better choice than Excel even if you are tracking only a single entity with no relationships to other tables. An Excel worksheet has an upper limit of just over a million rows (a maximum of 1,048,576 rows and 16,384 columns). Exceeding that limit would require spreading the data across multiple files that would make querying and managing the data much more complicated. Access, on the other hand, has no built-in upper limit on the number of rows in a table, so long as the size of the database file does not exceed its upper limit of 2GB.
However, an Excel file of less than a million records could still benefit from a move to Access, because Access indexes its tables to speed access to the data. Indexing means presorting the rows in that table by the field (or fields) that contain the search criteria value so the records containing that value can be retrieved more quickly. This is analogous to a dictionary, where the words are listed in alphabetical order for much quicker searches. Because Access indexes its tables, searches run against a large Access table would usually run more quickly than a search against a similarly sized data set in Excel. Note, however, that there are a number of variables that affect retrieval time for searches in Excel and Access, including the nature of the data and the type and frequency of user access to that data.
Use Access For Many Concurrent Users.
Access is usually also a better choice if you have multiple users who will need access to the database at the same time. Both Excel and Access do allow for multi-user access to a file, but Access has much more robust multi-user support.
Excel supports multiple users with the Shared Workbook feature, which allows multiple people to open and modify an Excel file at the same time (though some Excel features are not supported in a shared document). If two users attempt to edit the same cell at the same time, Excel keeps the changes of the user who saves the workbook first, but also displays a resolve conflicts dialog box to allow users to override this behavior.
Also, if you are using Excel 2016 with Office 365, the co-authoring feature offers an even more granular level of collaboration than Shared Workbooks. By uploading the document to OneDrive or to a SharePoint Online Library and then sharing it with a list of other users who also sign in with their Office 365 accounts, users will then be able to see each other’s changes to the Excel document in real time, rather than just when the workbook is saved. However, both Shared Workbooks and Coauthoring are features that work only a small number of users collaborating on light edits to a workbook, rather than a heavily updated file in a multi-user data entry environment.
Access, on the other hand, supports locking at the more granular data page level, so a user editing one customer is less likely to interfere with a different user who is editing a different customer from the same table. This makes Access a much better choice than Excel for applications with concurrently-accessed, heavily-updated tables. Access has a limit of 255 concurrent users, above which Microsoft SQL Server would be a better choice.
Because Access is designed by default to be multi-user, a database can be shared with multiple users simply by placing it in a shared network folder. The data in the tables and all the database objects (such as queries, forms and reports) are then shared across the network for those users who access the database.
To increase its scalability, an Access database can also be shared by splitting the database file. In a split database architecture, the Access tables reside in a single location of the database on the server, called the Back-end Database, while all the other database objects that make use of those tables (the queries, forms and reports, etc.) are stored in a separate copy of the database on each of the users’ workstations, called the Front-end Database. By having multiple client copies of an Access database that contain the business logic pointing to a single copy of the server database that contains the tables, performance is improved because only the data from the tables is sent across the network. Also, the data in the tables (which changes frequently) is stored in a single location where changes to it can be tracked easily since there is only one copy. This is more reliable, since any file corruption affecting the Front-end objects would be limited to just an individual workstation and not the shared copy of the file on the server.
Use Access For Heavy Data Entry
Data entry can be tedious and is a common source for errors, so it’s important that database applications offer some features to improve speed and accuracy. Excel does offer some basic formatting and data validation rules, but Access has more features for data validation and much more flexible form design.
Data entry constraints are a way to implement business logic. For example, making the Customer Name a required field prevents it from being left blank, since having an address and phone number without a name would make the record useless. Or a validation rule might require that an order’s Ship Date must be sometime after the Order Date, since an order can’t be shipped before it is even placed. Or a referential integrity constraint on the relationship between the Customers and Orders tables could be used to prevent an order from being keyed in for a nonexistent customer.
Excel has some data validation features that can be used to constrain what values are entered into cells. For example, a numeric field whose value is required to be within a certain range could be constrained so that a value outside of that range cannot be entered, or alternatively, so that Excel would give a warning to the user but allow the data to be entered anyway. This same feature can be used to compare the value being entered to a value stored in a different cell, so that an Order Date field must be earlier than a Ship Date field, for example.
Access also has field level validation rules, input masks that can prevent invalid keystrokes from even being recognized, and record validation rules which check validity by comparing values in two different fields. But Access also allows validation rules to be placed in a form so they apply only to data that is entered into the table through that form, but not to data entered into the table through a query, a different form, or directly into the table. Access also allows a much greater degree of form customization so that fields can be arranged on the screen in a way that makes data entry the easiest. For example, when data is entered from a paper source, forms are usually designed so they mirror the layout of the printed source they are being entered from. They also may contain instructions or other information to assist the data entry staff in properly inputting the data.
Finally, Access also supports referential integrity constraints on two tables with related records as discussed earlier. For example, a Customers table and an Orders table might be related by sharing the CustomerID field in common: the CustomerID field would be stored in the customers table to uniquely identify each customer, and it would also be stored in the Orders table to identify which customer placed that particular order. Referential integrity constraints ensure that a customer record is not deleted if that customer had a least one outstanding order (which would orphan the Order records), or that a change in the CustomerID number in the Customers table would cascade down to the Orders table.
So while Excel does have some validation rules and data entry through forms, if you need a wider range of validation rules, need to enforce referential integrity, or need to have more feature-rich form design, then Access would be a better choice.
When to use Excel instead of Access
Although there are many good reasons to use Access, there are still some database projects that are best handled in Excel. Excel is much better at crunching numbers, so if your database contains numerical data that you will use for statistical, financial or other types of calculations, Excel is usually a better choice.
Excel also has superior charts and graphs, so if you want to summarize your data graphically then Excel will give you more options for formatting and chart types. Excel’s pivot tables are much better than those in Access, and are a great way for numerical data to be condensed and summarized very concisely. And Excel allows users of both charts and pivot tables in Excel to apply filters and other modifications to the way the data is displayed on the fly, after the chart or pivot table has been created.
With that said, however, Excel can provide all these features for data stored in other systems by connecting to data from outside sources. So data that is stored in Access, SQL Server or other sources could still be pulled into Excel and just to use it as input for charts, pivot tables and numerical analysis.
Choosing Between Access And SQL Server
The main advantages of SQL Server over Access are that it supports larger databases, more concurrent users, and it is more secure. But SQL Server is also more expensive and requires more specialized knowledge to administer.
One of the most common reasons to choose SQL Server over Access is to overcome the 2GB database size limit in Access. The full version of SQL Server has no built-in limit (though SQL Server Express 2016 databases are capped at 10 GB). Often a database that starts in Access grows enough that it is migrated to SQL Server.
Another reason for choosing SQL Server is for increased concurrency; Access allows a maximum of 255 concurrent database users, while SQL Server has a true client server model with no built-in connection limit. Note, however, that there are many variables that affect concurrency, including the nature of the data, how that data is accessed by the users, and how the application was designed, so the number of users supported is not always a straightforward indicator of database resource usage. And realistically most Access environments would probably see a large enough drop in performance to warrant an upgrade before the upper limit of 255 users is reached.
SQL Server is also more reliable than Access, and so is a better choice for high availability solutions. Access databases are more likely to become corrupt than SQL Server because individual users have a direct connection to the data, and a sudden loss of that connection can corrupt the database. The Access Compact and Repair utility should be run periodically for maintenance, but it requires all users to be disconnected before running. SQL Server, on the other hand, can automatically tune itself by rebuilding indexes and statistics and defragmenting files even if users are connected.
Note that it is possible to migrate the tables of an Access database to SQL Server back end while keeping the front end of the databases in Access in exactly the same way that was discussed earlier with a Front-End/Back-End architecture in Access. This can combine the high availability and concurrency of SQL Server with the business logic and ready software availability of the Access Front-end, and it can even work in a heterogeneous environment where users are using different versions of Access. This can work particularly well for legacy databases that have significant business logic that has already been developed in Access, but has outgrown the concurrency or size limitations of the software.
Finally, SQL Server has much better security than Access. The entire Access database is visible to the file system, and anyone who has permissions to the database itself could copy the entire file. Access does have built-in file encryption and password protection, but Microsoft Office passwords are weak and easily broken. SQL Server, on the other hand, stores its database files on the server where most users would likely not have permissions to access them directly, and only the data that is necessary to populate a results set is sent across the network.
Conclusion
A major part of a database design project is choosing the right software, and in this article we have outlined the most common arguments in favor and against using Access. Although the Access software is readily available and can empower users to manage their own data, it is likely not as good a choice for larger scale database solutions. Call Marlinspike Consulting if you need help with your database design project.
Recent Comments