Tuesday, February 11, 2020

B5 - Relational Database Theory Basics

A database is simply a series of tables that contain data. Within the series of tables, mathematical operations connect data to each other and record attributes and characteristics about that data. These attributes and characteristics are what connects data and help the database identify relationships. Databases are used in all disciplines: business, STEAM, agriculture, and more all utilize databases to perform more efficiently and store past data. Google is one of the most widely-known examples of a database, however the company does not publicly publish how large their database is.

Relational database theory states that these data tables are split up into sets of tuples (aka relations) and attributes. Each row in a matrix of data represents a tuple and each column represents the attributes. How a database organizes its tuples and attributes is called the database schema. Scripts can be written to only allow valid entries into certain tuple/attribute locations. For example, if the tuple was “animal” and the attribute was “mammal”, an entry of “banana” would not be valid in that data entry space. These scripts are called keys, and there are many different kinds of keys that can exist in a database that function slightly differently from each other.

In addition to keys, databases can have scripts that function between tables and perform operations that results in notifications. For example, if certain values of a database change to a given range, this may result in a warning output or update. These scripts are also what helps secure database editing and can make certain tables password protected, read only, or confidential.

Searches within a database are called queries. Queries use Standard Query Language (SQL) syntax to perform searches between tables and relations. Some of these operations include projection, selection, and join. Projection isolates a set of data entries based on specified attributes of the tuples. An example of this would be samples of concrete sorted by their mass, compressive strength, tensile strength, and water-to-cement ratios. Projection isolates only the necessary data called by the query. Selection is similar to projection, but it isolates certain tuples identified by a certain characteristic (for example, a list of US Senators that have served at least 1 previous term). Join is a function that combines two similar sets of tuples into another table. An example would be fast food restaurant locations and their list of managers.

Data farms are a fast-growing subsection of the construction industry because of their increasing demand. The ability to efficiently (and securely) search for data maximizes productivity and will continue to be the case in the future as we rely more and more upon historical data.

Reference: Rosenberg, B. (2007). Relational Databases. Retrieved from http://www.cs.miami.edu/home/burt/learning/Csc598.073/notes/reldb.html


Comments:

Isabel,
I am not a coder myself, so I found this brief synopsis of what SQL is and where it came from to be very useful. While I don’t understand the ins and outs of coding, I do recognize the importance of being able to store, retrieve, and analyze data efficiently and securely. Thanks for the info!
(Link to post)

Varsha,
Last co-op I worked for a General Contractor who mainly relied on those “old-school” record keeping techniques (I liked to call it “death by spreadsheet”). We had decades of information about contractors, owners, past projects, estimating tools, and others, but no truly efficient way to share that information among coworkers or new hires. This siloed many of the people into a certain job responsibility (ex. MEP estimates, casino jobs, $25mil+ jobs, etc) and made the operation very inefficient. I would like to be one of the advocates for company-wide databases of information of all kinds that can be used as a valuable training tool for employees. I agree wholeheartedly with when you talked about mutual benefits from shared information and hope to see more of that moving forward in my career.
(Link to post)

Andrew,
Thank you for your analogy for OODBs, it made understanding how they work incredibly easy for someone who does not have very much experience with coding databases. I wonder how difficult it is to repair incomplete or inaccurate databases; for example, if inheritances and extensibilities are used from a parent database that has wrong information, what would the process be to repair these databases? Is it automatic or would the data scientist have to start the new databases from scratch? Thanks for your post!
(Link to post)

No comments: