BlogApril 11, 2018
What You Can Learn from Our Machine Learning Challenge
How can a company use the seemingly endless data at its fingertips to make better decisions? One possibility to explore is machine learning. Building upon the momentum from the Natural Language Processing Innovation Challenge, CapTech offered any and all interested CapTechers to identify business use cases and create a machine learning solution.
Seven teams answered the call, each delivering a working solution that gave hands-on experience and a better appreciation for implementing machine learning. Here are the use cases the teams worked to solve:
- Employee Attrition: Attrition analysis and prediction based on a variety of job-related categories
- Inventory Storage: Analysis of the time it takes to sell product inventory and an algorithm to predict the ideal location for inventory
- Project Staffing: Approach to identifying the right individual for the right project to support a staffing process
- Call Volume: Prediction of call volume in a call center, allowing for more informed adjustments to call center staff
- Query Recommendations: Enhancement to voice-based intelligence, recommending personalized inquires for a given user
- Customer Reviews: Text recognition for product reviews to understand customer feedback trends
- Social Media: Analysis of social media posts within an industry, providing visibility to a brand's social awareness
UNDERSTANDING MACHINE LEARNING CHALLENGES
Before proceeding with machine learning, it's important to understand the different issues you may encounter as you implement your solution. From the struggle to find enough data, to selecting the best algorithm for their use case, our teams had to find unique ways to circumvent the challenges we encountered. Here are some challenges you may encounter:
UNDERSTANDING THE PROBLEM
One of the most important aspects of machine learning has nothing to do with machine learning at all. You must have a complete and thorough understanding of the business case you are solving before ever beginning to think about implementation. This understanding will allow you to perform the necessary "sanity" check throughout the process to ensure the result aligns to what you expected.
If you don't understand the problem, how can you choose the right inputs and analyze the outputs to ensure your solution is correct? Instead of feeding your algorithm everything, you should use your understanding to be selective and help eliminate data redundancy. Once you have your outputs, you must ensure they fit into the larger context of the problem.
DATA, DATA, DATA
Second to understanding your problem is data. The entire solution is dependent on the data you have for training your model and ultimately feeding into your system. Here are some of the main data challenges the CapTech teams saw when implementing our solutions:
- Availability: How readily available is your data? Is the right data available? A machine learning solution is driven by the data you can access. You want to ensure your solution is not only easily ingesting data but is ingesting the right data. If the data is not available, start building solutions to aggregate the data now; you can build new data streams, repurpose other data, or synthesize data.
- Size: You will need a large amount of data for training and testing your model. Each problem requires a different data set size, but in general, the more complex the problem, the more data is required to create an accurate model. If you don't have enough data now, continue collecting until you do, or think about incorporating other data streams.
- Quality: Your model will only be as good as the data it receives. Data that contains duplicate, incomplete, inconsistent, or corrupt data will hamper your model's effectiveness. During the challenge, the teams spent most of their time sanitizing the data as this was the most important factor affecting the model's quality and effectiveness.
- Bias: Biased data is unintentionally skewed and can affect the accuracy of the model. When gathering your data, you must make every attempt to keep your data unbiased (understanding your problem really helps). As you view the model's outputs, be aware of biases and look for them in the results.
MODEL SELECTION & IMPLEMENTATION
Machine learning algorithms are capable of uncovering insights and trends within data sets that were previously unknown. Developing your own proprietary model from scratch is an option, but it requires a high level of expertise and time. Most of the time, you will choose from categories and model frameworks, such as logistic regression or neural networks, that have already been developed and vetted.
- Selection: Choosing a model requires a technical understanding of your options and patience. Each model has specific uses that may or may not apply to your business scenario. Once you have a list of potential models, you will have a trial and error period to see which algorithm gives the best results. Each model needs to be tuned to your data.
- Implementation: When implementing your solution, you can either take a "hands-on" approach, where you develop all aspects of the solution around the model, or you can use a software package, such as Microsoft Azure, that does most of the work for you. The first option provides more control but takes longer to implement and has more room for error while the second option is faster and easier but doesn't provide insight into the way the solution works.
- Training: You will want to ensure you have the right data for training and testing your model. You can have two independent data sets, or you can have one large data set you split. You then train your model on the training set and see how it performs on the test set. Beware of overfitting your model to the training data; models with near 100% accuracy tend to be overfitted and result in poor predictive performance because they don't account for possibilities other than what they were trained on.
Machine learning solutions can have intense data and computational requirements. Your infrastructure will need to account for this and will need to scale as your solution grows. Here are two areas to consider during your implementation:
- Storage: With machine learning, you may need TBs of data or more. It's important to consider how much storage space you are going to need now and in the future, as you continue to collect data.
- Processing: Machine learning is computationally intensive as you are processing incredible amounts of data and other libraries very quickly and efficiently.
As businesses continue to capture more data and technology advances, machine learning should be considered as a potential analytic tool. By having a proper understanding of your business case, it will be easier to navigate the issues you may encounter and will result in a better overall solution. When used correctly, a machine learning solution will aggregate and analyze enormous datasets to find hidden patterns and ultimately help you make better business decisions.