When I became the website administrator in November 2018, my first task was to produce an updated buyer list for our sponsors. You would think this should be a relatively easy task because all of the data we need is in the website database.
That’s what I thought. I was wrong.
The most recent buyer list was a CSV exported from WooCommerce Memberships. This was unsatisfactory for two reasons:
- The CSV included data that should not be shared publicly, like usernames, that could be used in a security exploit.
- It did not include data that should be shared with ZAGs sponsors, like contact information or which institution for which the member is a buyer.
I naively assumed that I should be able to write a simple SQL query to the database, something along the lines of:
SELECT * FROM customers, members WHERE customers.id = members.customer_id
Alas, ’twas not to be.
You may think it’s EAV, but it’s not…
I discovered that the WordPress user data is organized into an Entity – Attribute – Value scheme. Except that it isn’t a pure EAV model.
Ideally, an EAV database has three tables:
- An entity table consisting of a primary key and the entity value. In this case, I expected to see a user id number, a text username, and possibly the corresponding user email that also permits logging into the site.
- An attribute table consisting of a primary key, an attribute name, and possibly a format definition for the attribute or validation constraints.
- A value table consisting of its primary key, a foreign key to the entity table to identify the user, a foreign key to the attribute table to identify the data element, and finally, the actual value for that user/attribute combination.
EAV tables tend to be narrow, but long. They are perfect for cases where not every record has the same attributes. A traditional 2-dimensional table such as a spreadsheet would be very wide and have many empty fields. EAV models are compact, and flexible. Adding a new attribute just requires a new record in the attribute table. You don’t rewrite the schema or waste space adding empty fields to every preceding record.
Unfortunately, the WooCommerce contact information isn’t stored in a pure EAV scheme. There is no attribute table: instead the attribute name is stored in the value table itself.
This does have the advantage in that it greatly simplifies the query logic. EAV databases are complex to query because you need to join 3 tables. But working with 2 tables in an e-commerce environment, you can quickly code for each field in a form:
SELECT * from customers WHERE userid = n AND attribute = "billing_address_1"
On the other hand, excluding the attribute table wastes a lot of space: the attribute name is repeated every time it is used. (For example, the attribute name “billing_address_1” requires 16 characters, while a typical integer index requires only 4 bytes.)
(Granted, I’m showing my age here. I started programming when both RAM and disk space were limited. While you can argue that modern tools make my conservative approach to memory management obsolete, I still feel a great deal of personal satisfaction in efficient, lean code.)
TL;DR – Straight to the SQL queries
In the end, the quickest way to generate the buyer list from the website tables was to use two SQL queries, then manipulate the data offline as needed. Here are the queries, using generic table names for security:
SELECT ID, user_login, user_email, user_url, display_name FROM `wp_users`
SELECT user_id, MAX(CASE WHEN meta_key = 'billing_company' THEN meta_value END) AS company, MAX(CASE WHEN meta_key = 'billing_first_name' THEN meta_value END) AS first_name, MAX(CASE WHEN meta_key = 'billing_last_name' THEN meta_value END) AS last_name, MAX(CASE WHEN meta_key = 'billing_address_1' THEN meta_value END) AS address_1, MAX(CASE WHEN meta_key = 'billing_address_2' THEN meta_value END) AS address_2, MAX(CASE WHEN meta_key = 'billing_city' THEN meta_value END) AS city, MAX(CASE WHEN meta_key = 'billing_state' THEN meta_value END) AS state, MAX(CASE WHEN meta_key = 'billing_postcode' THEN meta_value END) AS postcode, MAX(CASE WHEN meta_key = 'billing_country' THEN meta_value END) AS country, MAX(CASE WHEN meta_key = 'billing_phone' THEN meta_value END) AS phone, MAX(CASE WHEN meta_key = 'billing_email' THEN meta_value END) AS email FROM `wp_usermeta` GROUP BY user_id
Full disclosure: my database chops were a bit rusty. Fortunately, we live in an age of search engines and I was able to find an example which let me quickly write the above query.
… and into a spreadsheet
With these simple 2-dimensional tables in hand, I was able to quickly create a combined table in a spreadsheet. This spreadsheet is now the master buyer and sponsor list for the ZAG website. There are of course, pros and cons to this process:
- I’ve sanitized and normalized the data elements.
- I can filter the table in real time as opposed to writing an ad hoc SQL query in the php admin panel.
- Exceptions can be handled without resorting to changing the underlying data tables (and possibly breaking the site).
- Every update must be manually transferred to the spreadsheet
- Changes made by members outside of the renewal process won’t be transferred to the spreadsheet
- Automatic website backups do not include the offline spreadsheet
While it would be easier to let our members update their contact information themselves, the presence of many hands without validation rules guarantees inconsistencies in the presentation of the data.
In the end, for flexibility and data integrity, we’re using an offline spreadsheet to maintain our member and sponsor data.
In the next post, I intend to document the order processing cycle.