Table of Contents
In the case of a database that's in use for a long time and contains user-generated data, some columns might outgrow the data types that were used to store the data.
This blog post will describe a tool called fieldtop that checks for overflows and underflows in all tables and databases stored in a MySQL server.
Gnoosic.com has been running for over 10 years. A user first tells Gnoosic artists they like. Then the user is presented with other artists he might like and he can pick which ones he likes and which ones he doesn't. As time goes by, Gnoosic learns more about what users like and stores those preferences in a database.
Among other information, Gnoosic also stores the popularity of
different bands. This information is stored as an integer data type.
So, the popularity field for the entry about Pink Floyd reached 1.3
billion. The maximum allowed for the INT data type is 2.1 billion
2,147,483,647) . After discussion, and identifying
the motivation behind the project, a collaboration emerged with the
aim to develop a tool that would check for situations like these so
they can be avoided.
Running the tool uncovered a few other columns in different tables of the Gnoosic database that required attention, and since this tool was built to support decisions about the schema of MySQL database, it was opensourced under MIT license.
In addition to the main use-case, there is a dual use-case. If the tool shows that the maximum values stored in a column are a long way from reaching the maximum, and if the data type allows (for text data types), the length of these columns can be fine-tuned for more efficient use of disk space.
What data types are applicable
The inner-workings of fieldtop
The tool gets the data types of each column from information_schema, then it computes how close those values are to the maximum allowed values for their respective data types.
The following query is used to fetch information from information_schema:
SELECT b.COLUMN_NAME, b.COLUMN_TYPE, b.DATA_TYPE, b.signed, a.TABLE_NAME, a.TABLE_SCHEMA FROM ( -- get all tables SELECT TABLE_NAME, TABLE_SCHEMA FROM information_schema.tables WHERE TABLE_TYPE IN ('BASE TABLE', 'VIEW') AND TABLE_SCHEMA NOT IN ('mysql', 'performance_schema') ) a JOIN ( -- get information about columns types SELECT TABLE_NAME, COLUMN_NAME, COLUMN_TYPE, TABLE_SCHEMA, DATA_TYPE, (!(LOWER(COLUMN_TYPE) REGEXP '.*unsigned.*')) AS signed FROM information_schema.columns ) b ON a.TABLE_NAME = b.TABLE_NAME AND a.TABLE_SCHEMA = b.TABLE_SCHEMA ORDER BY a.TABLE_SCHEMA DESC;
The results of this query are fetched inside a PHP program 1 that analyzes them and computes the percentage of the maximum value for the data type that was found in that each column.
Below is a short demo that showcases the tool described in the post:
We've covered the usage of a database tool that can be used to prevent overflowing columns in a proactive manner.