Data Considerations for Businesses
Every modern business collects data. We live in an electronic world where data is a ubiquitous and a valuable asset that can be collected from many sources. These datasets can be analyzed and operationalized by new and existing businesses to create products and services that are highly valued by consumers or provide insight into potential business optimization strategies to positively impact profitability. New and existing businesses can leverage data to better understand their target markets, shape products and services, and enable faster growth in the marketplace. The most impactful datasets will provide information about a business’ customer base, how customers perceive the business and its products and/or services, how customers engage the business via online interactions and within the sales process, or, increase the value of a product or service.
This guide provides an actionable overview of how to enhance your business with data.
Before collecting or purchasing new datasets, businesses should:
- Identify and prioritize revenue drivers. Prioritizing data-centric initiatives which will make money for the business is key to ensuring the business is not chasing data projects with no clear value to business operations, products and services, and the end consumer or customer.
- Inventory the data currently being collected and perform an assessment of how that data is or is not being utilized. The typical business owner may be surprised by the amount of data already being collected by the business.
Internal datasets are often readily available within the processes implemented by a business, such as: direct customer interactions via the business’ web site or social media, product or service usage tracking mechanisms, product or service sales, or other business operations data sources. These datasets can be analyzed to improve operational efficiency and positively impact business profitability.
- Evaluate how other companies within the business’ sector are utilizing data and analytics to drive revenue and optimize business operations. The evaluation of how similar businesses utilize data will provide clues to companion datasets which can be combined to drive profitable outcomes.
- Identify external datasets which may be impactful to business operations, the delivery of existing products and services, or exploration of new markets.
To begin your exploration of external data, please see the list of external data sources provided in the appendix of this guide External datasets are acquired from public, cooperative, and commercial sources:
Public data is available via a variety of sources including but not limited to: worldwide government organizations, financial and economic institutions, scientific institutions, academic entities, environmental organizations, media organizations, social media data, etc. This data is published to a variety of sources on the internet and can be downloaded and utilized free of charge.
The state of Virginia maintains the Virginia Open Data Portal (data.virginia.gov) which provides access to many public datasets across categories such as: business, employment, finance, geographic information, public safety, etc.
Cooperative Data Exchanges are an emergent component of the data economy universe. Data exchange platforms can help companies access data assets and develop new revenue streams by seamlessly connecting data suppliers with data consumers. Some ‘data marketplaces’ are home to diverse datasets; others may focus on a particular type of industry or functional business area.
Commercial data can be utilized under a purchased license in a fee for use model. Pricing models and overall costs vary by vendor and dataset. Commercial datasets are available in a wide range of focus areas, including but not limited to: consumer, social media, weather, financial markets, real estate, etc.
- Determine if professional assistance is required. A business owner focused on the day-to-day running of the business may not have the time to become an expert in data and analytics.
Many businesses make the mistake of collecting or procuring data without having a clearly defined value proposition or use case. In developing a value proposition, we recommend that businesses utilize The Value Proposition Canvas, a tool which can be used to align proposed data-centric, business-enhancing strategies with a customer’s values and needs. The tool allows companies to create customer profiles and evaluate the business’ data-centric value proposition to ensure that a given approach will enhance a product or service and create value for the end customer.
If the intent is to enhance business operations, business owners can utilize the Value Proposition Canvas for internal planning. In this case the business itself would be the “customer”. Businesses need to create a detailed, data-centric value proposition before investing valuable resources in the pursuit of data-centric initiatives. Please refer to the appendix to find a link to the Value Proposition Canvas.
The following is a simple example of how data can provide value to a business:
Take the example of a local Virginia golf course. Let’s explore how it might use data to optimize its business operations. Key revenue drivers for a golf course are: rounds of golf purchased, and concessions. The profitability of a golf course is dependent upon patrons purchasing rounds of golf. Additionally, a golf business has hefty overhead and operational costs in the form of wages paid to staff and in the maintenance of the grounds and facilities.
Golf is an outdoor sport which is optimally played during times of mild weather. A reasonable assumption is that local courses will sell the most rounds of golf and the most products via concessions during times of mild weather because patrons value the experience of playing in comfortable conditions. During days of extreme weather, the golf course experiences a decrease in revenue because patrons do not value paying for golf during days of rain, cold, or extreme heat.
Given these factors, a local golf course may enhance its business by:
- Utilizing weather data and analytics tools to better understand the relationship between weather, sales, and operational costs.
- Analyzing patron information captured when rounds of golf are sold, such as names, addresses, ages, etc. Businesses often use customer relationship management (CRM) software that collects data like customer records or sales transactions. This data provides insight into their customers’ spending, use of products or services, and other consumer behaviors which may lead to cross sales, or selling additional products to current customers. The capture and retention of these types of information may introduce data privacy concerns. These issues are addressed later in this document.
- Leveraging complimentary technology such as a 3rd party mobile application and geospatial data to deliver experience enhancing information to the golfer via a smart-device which, for example, could provide the golfer’s current distance to the green. An enhancement in experience increases the value of the business’ product (golf) and therefore drives additional revenue and distinguishes the business from its competitors.
Once a business has identified the target datasets it needs to implement data-centric business enhancements, it can develop a detailed plan. The plan can detail all of the resources i.e., funding, subject matter expertise, time, tools, required to execute the initiative and detail the selected datasets and analytical approaches to be used. This is a critical point in the planning process. If the business does not have in-house expertise with data collection and analytics, the business may engage a consultant or outside firm with the requisite skills. A subject matter expert can help validate the data-centric value proposition, make recommendations on analytical approaches, and provide support in implementation.
The following are approaches to data analytics which are used to glean insights from data:
- Descriptive Analytics: Uses collected historical data to provide insights into what has happened in the past.
Example: Use a business intelligence software tool to generate a report showing how many rounds of golf were played each month during the prior twelve months. Descriptive analysis will allow the business to identify slow months in order to develop sales and marketing strategies to address the slowdown in business.
- Diagnostic Analytics: Uses collected data to identify why something is occurring within the business.
Example: The local golf course may analyze the average high temperature, low temperature, and chance of precipitation for each day over the past year. This level of analysis will provide insight into the weather conditions which most greatly impact revenue, both negatively and positively.
- Predictive Analytics: Uses collected data to predict what will happen in the future.
Example: The local golf course integrates a commercial weather prediction service and its sales management application to utilize weather data to predict future sales. This allows the golf course to predict future revenue based upon weather data.
- Prescriptive Analytics: Prescribes action based on analysis of data resulting from past events.
Example: The local golf course may use analytics to determine when to scale-up or scale-down staffing based upon weather. The course may also choose to automate the pricing of a round of golf based upon weather conditions to capture business that would otherwise have been lost during non-ideal conditions.
The collection of data about individuals may present various privacy, protection, legal, and regulatory concerns. Business owners will need to evaluate which sensitive data types they are collecting, how they are utilizing the data, and how they are managing collected data to ensure legal and regulatory compliance. Data collected about individuals will differ based upon the type of business, products and services provided. A business such as a golf course may have minimal data privacy and data protection concerns and a business operating within the healthcare industry might collect and maintain much more sensitive data.
The following sensitive data types are commonly collected by businesses:
- Confidential Business Information
- Personally Identifiable Information (PII)
- Payment Card Industry (PCI)
- Protected Healthcare Information (PHI)
Confidential Business Information
Confidential Business Information refers to information where disclosure may harm the business and/or its business associates. Confidential Business Information is intended to be kept for internal use only and may include information pertaining to: trade secrets, processes, operations, style of works, or apparatus, or to the production, sales, shipments, purchases, transfers, identification of customers, inventories, or amount or source of any income, profits, losses, or expenditures of any person, firm, partnership, corporation, or other organization, or other information of commercial value. Business Confidential Information should be protected under non-disclosure agreements. Businesses should engage legal counsel to review any agreements established with other companies and organizations who are providing data under a non-disclosure or terms-of-use agreement to ensure compliance. Any breach of these agreements may bring lawsuits against the business.
Personally Identifiable Information (PII).
PII is information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual. Some information that is considered to be PII is available in public sources such as telephone books, public Web sites, and university listings. This type of information is considered to be Public PII and includes, for example, first and last name, address, work telephone number, email address, home telephone number, and general educational credentials. The definition of PII is not anchored to any single category of information or technology. Rather, it requires a case-by-case assessment of the specific risk that an individual can be identified. Non-PII can become PII whenever additional information is made publicly available, in any medium and from any source, that, when combined with other available information, could be used to identify an individual.
Much of the concern surrounding data privacy is if or how personal data is shared with third parties. Large corporations such as Facebook, Amazon, and Google have focused their business models on selling and profiting off individuals’ data. Today, there are various state, federal, and international laws protecting PII data. At this time each of the 50 states enforce laws protecting customer PII data collected by businesses. The following are two examples of laws protecting PII in the state of Virginia:
|Virginia Consumer Data Protection Act||● Recently signed into law in March 2021, effective in January 1, 2023
● VCDPA applies to all entities “who conduct business in the commonwealth of Virginia or produce products or services that are targeted to residents of the Commonwealth” and, during a calendar year, either: (1) control or process personal data of at least 100,000 Virginia residents, or (2) derive over 50% of gross revenue from the sale of personal data (though the statute is unclear as to whether the revenue threshold applies to Virginia residents only) and control or process personal data of at least 25,000 Virginia residents.
|Virginia Personal Information Privacy Act||The Virginia Personal Information Privacy Act imposes the following restrictions on businesses operating within the state:
○ Restriction on the sale of personal information by brick-and-mortar merchants
○ Prohibition on the collection of date of birth in connection with accepting a check as payment
○ Restriction on the use of Social Security numbers
○ Restriction on the purposes for which a merchant may scan a driver’s license or identification card
○ A personal information breach notification law, under Va. Code §18.2-186.6 (‘the Breach Notification Law’). Virginia’s Breach Notification Law applies to individuals, government, businesses, and any other legal entity, whether for profit or not for profit
Examples of federal privacy laws:
|The Gramm-Leach-Bliley Act||The Gramm-Leach-Bliley Act requires financial institutions – companies that offer consumers financial products or services like loans, financial or investment advice, or insurance – to explain their information-sharing practices to their customers and to safeguard sensitive data.|
|The Children’s Online Privacy Protection Act||The Children’s Online Privacy Protection Act (15 U.S.C. §§ 6501-6506) allows parents to control what information is collected about their child (younger than 13 years old) online. Operators of websites that either target children or knowingly collect personal information from children are required to post privacy policies, obtain parental consent before collecting information from children, allow parents to determine how such information is used, and provide the option to parents to opt-out of future collection from their child.|
An example of an international privacy law:
|GDPR (General Data Protection Regulation (GDPR)||The General Data Protection Regulation (GDPR), which took effect May 25, 2018, affecting organizations worldwide, including universities. The GDPR replaces the Data Protection Directive 95/46/ec as the primary law regulating how companies and organizations protect the personal data of people located in the European Union (EU).|
PCI data is the private data of payment card holders. Any business that processes, stores, or transmits payment card data must consider PCI data privacy and protections concerns. Cardholder data refers to any information printed, processed, transmitted or stored in any form on a payment card. Organizations accepting payment cards are expected to protect cardholder data and to prevent the unauthorized use of data. The Payment Card industry has developed a cybersecurity standard called the Payment Card Industry Data Security Standard (PCI DSS) with defines the data protection standards that businesses must agree to via contract with their payment card processor. If the business falls out of PCI DSS compliance, the business may be hit with hefty fines from its payment card processor or civil suits resulting from the loss of PCI data.
PHI is protected by the Health Insurance Portability and Accountability Act of 1996 (HIPAA), a federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. HIPAA provides a list of 18 identifiers which are considered to be PHI:
- All geographical subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.
- All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
- Phone numbers;
- Fax numbers;
- Electronic mail addresses;
- Social Security numbers;
- Medical record numbers;
- Health plan beneficiary numbers;
- Account numbers;
- Certificate/license numbers;
- Vehicle identifiers and serial numbers, including license plate numbers;
- Device identifiers and serial numbers;
- Web Universal Resource Locators (URLs);
- Internet Protocol (IP) address numbers;
- Biometric identifiers, including finger and voice prints;
- Full face photographic images and any comparable images; and
- Any other unique identifying number, characteristic, or code (note this does not mean the unique code assigned by the investigator to code the data)
As sensitive data aggregates, so does the level of legal liability to a business. Businesses must understand the level of risk they are accepting by collecting and analyzing sensitive data. In order to achieve data privacy, the appropriate data protection mechanisms must be established. All data protection mechanisms and compliance strategies must meet the data protection compliance regulations as defined by state and federal laws. To meet these standards, and protect against loss of data, businesses must adopt data security practices and utilize resilient technology to mitigate the impacts of a potential attack or security breach. Protected data can be compromised via theft, ransomware, or other malicious activities, and the penalties for not having security plans in place can be severe.
The consequences of information theft grows exponentially when businesses store sensitive or confidential data such as PII, PHI, and PCI. This underscores the importance of data protection. Please note, professional liability insurance may not cover loss of data when the data is in the custody of your business.
Here are initial steps that a business can take to implement data protection:
- Educate Your Team: The entire business needs to understand the laws and regulations governing the business’ use of data. Additionally, general cybersecurity training is recommended annually. There are many available cybersecurity training courses available via the internet. Attackers may target non-technical staff to gain access to data.
- Determine if Cybersecurity expertise is required: Determine if external cybersecurity expertise is needed. A cybersecurity consultant will be able to help the business build a cybersecurity posture which protects data in line with legal and regulatory compliance standards. The consultant will recommend an approach to securing data.
- Establish a Data Protection Program: Determine if the business requires an internal data protection program. The business may decide to establish a data protection program in-house or outsource data protection to a cybersecurity firm. If establishing a data protection program in-house, the business should identify a person or group who is responsible for data protection and an executive champion who is accountable. The data protection program should be led by an individual with direct experience and/or certifications in corporate cybersecurity such as the Certified Information Systems Security Professional (CISSP) or Certified Information Privacy Professional (CIPP) and staffed by resources who have data protection experience. Establishing the data protection program may require the business to hire cybersecurity talent. The data protection program will perform privacy impact assessments to help the business understand the data it collects, processes, and stores and any privacy concerns with those datasets. Additionally, the Data Protection Program will be responsible for communicating risks and issues with datasets and ensuring appropriate operational data protection controls are established.
- Perform Data-Centric Cybersecurity Audits: At a minimum, businesses should audit their utilization and management of data on an annual basis. Such audits can help identify weaknesses in the business’ security posture and address them before they are exploited and loss of data occurs.
- Consider Cyber Liability and/or Data Breach Insurance: Given a business’ exposure to cyber risks and attacks, having insurance could mean the difference in surviving a data breach or cyber security event and ensuring victims are compensated. The business should review its liability insurance coverages with its current insurance provider to determine if data breaches or cyber security events are covered. If not, the business should research and consider options for the appropriate coverages.
The benefits of data integration and data analytics to a business are hard to overstate, yet it’s crucial to consider the cost, regulations, and business requirements before making the leap. With proper planning businesses should feel empowered to begin exploring how data and analytics can positively enhance their products, services, operational efficiency, and ultimately the bottom line.
- Virginia Open Data Portal
The Virginia Open Data Portal serves to extend access to Commonwealth data empowering our constituents to interpret, analyze, and transform our data into actionable intelligence. Secure and appropriate data sharing is fundamental to the success of our society because information supports engagement. Commonwealth data is a strategic asset that when leveraged, can drive innovation, increase quality of life, and promote economic growth.
The Virginia Open Data Portal provides more than just data access. Within the portal, you can view stories and dashboards, create visualizations, filter data, and access it via APIs (application programming interfaces) to build solutions in web and mobile applications.
- gov: Includes over 180,000 publicly available datasets and allows the user to conduct research, develop, web and mobile applications, and design data visualizations.
- World Bank Open Data: The world’s most comprehensive data regarding what’s happening in different countries across the world. World Bank Open Data has 3000 datasets and 14000 indicators encompassing microdata, time series statistics, and geospatial data.
- World Health Organization (WHO): includes health-specific statistics of its 194 Member States and contains 100 or more categories such as the Millennium Development Goals (child nutrition, child health, maternal and reproductive health, immunization, HIV/AIDS, tuberculosis, malaria, neglected diseases, water and sanitation), non-communicable diseases and risk factors, epidemic-prone diseases, health systems, environmental health, violence and injuries, equity etc.
- Google Public Data Explorer: Includes data from other sources listed in this document and allows the user to experiment with dynamic data visualization tools.
- Registry of Open Data on Amazon Web Services (AWS) (RODA): This repository contains public data from AWS resources that originates from different agencies, government organizations, researchers, businesses, and individuals.
- FiveThirtyEight: Data from a variety of sectors including politics, sports, science, economics. etc. This site allows the user to download the data and provides an explanation about the dataset, its source, and how to use it.
- US Census Bureau: US Census Bureau is the biggest statistical agency of the federal government. It stores and provides reliable facts and data regarding people, places, and economy of America.
- freeCodeCamp Open Data: This site is an open-source community that enables a user to code and build projects that can be used for free by nonprofits.
- United Nations International Children’s Emergency Fund (UNICEF) Dataset: Includes relevant data on education, child labor, child disability, child mortality, maternal mortality, water and sanitation, low birth-weight, antenatal care, pneumonia, malaria, iodine deficiency disorder, female genital mutilation/cutting, and adolescents. This data is updated regularly making it more comprehensive, reliable and accurate.
- Yelp Open Datasets: This dataset encompasses businesses, reviews, and user data for use in personal, educational and academic pursuits. There are about 5,996,996 reviews, 188,593 businesses, 280,991 pictures and 10 metropolitan areas included in Yelp Open Datasets.
- LODUM: Open Data initiative of the University of Münster. Under this initiative, it is made possible for the public to assess information about the university in machine-readable formats.
- USA Facts: From the American standard of living to immigration statistics to government finances: explore the data for a broader understanding of how the government spends its time and your money. These data sets can be used to compare historical trends, dig deep into the numbers, and interact with visualizations designed to give a better idea of the government’s impact on the nation and its people.
- Data USA: Deloitte, Datawheel, and Cesar Hidalgo, Professor at the MIT Media Lab and Director of Collective Learning developed Data USA, the most comprehensive website and visualization engine of public US Government data. Data USA tells millions of stories about American life from towns, cities and states; occupations, from teachers to welders to web developers; industries–where they are thriving, where they are declining and their interconnectedness to each other.
- The Value Proposition Canvas:
 The New Oxford American Dictionary defines data as “facts and statistics together for analysis”
 Business Confidential definition source, Cornell Law School, https://www.law.cornell.edu/cfr/text/19/201.6
 PII definition source, Cornell Law School, Legal Information Institute https://www.law.cornell.edu/cfr/text/2/200.79