By using the DataSet and Views API Endpoints, you can upload and combine data from different sources to feed into our automated pipeline. The data you send needs to be tidy and useful. As with most things in engineering, garbage in, garbage out holds true. It's important that you understand your data and choose the appropriate features (factors that influence) to get the best results.
Once your data is uploaded, next you want to choose the machine learning type you want to use to solve your problem.
Time-series forecasts and impact sessions are appropriate for data which has a naturally ordered timestamp associated with the data you are trying to forecast. Regression algorithms work well on problems where you require a continuous predicted variable from several other known variables.
No matter which you choose, the Nexosis API will go through a multi-stage elimination process to narrow down the appropriate algorithms to a handful of candidates that then are evaluated for performing the best fit on your data.
Now that you've selected your ML type our platform gets to work. There are several tasks that can kick-off at this point including data categorization, aggregation and imputation.
By notating your data, our platform can return better predictions but in the cases where that is neglected or unknown, the platform will determine which type of data is present.
Obviously, our platform knowing what type of data is in the columns you've provided can drastically help or hurt your results.
Often there are anomalies in datasets and these must be identified and removed as outliers (unless, of course, you are solving an anomaly detection problem).
Especially in time series data, a developer or data scientist might want to forecast at a granularity that is different than the raw data. While forecasting at a more granular level than the data is impossible, greater granularity can be easily achieved. Depending on your data, there are four different aggregation strategies that can be utilized.
In each case, unless specified (do this, it helps) our platform analyzes your data and determines the aggregation strategy that makes the most sense.
Sometimes data is submitted that has missing data which can create worthless predictions. To combat this and deliver the most accurate predictions, our platform will fill in missing values with one of four imputation strategies.
This strategy enables data that is neither numeric or logical to become useful to the algorithm by transposing table cell into a series of logical data columns.
Scaling makes sure that a predictor isn't overly favored in the algorithm.
Tuning is the process of choosing the correct parameters for each algorithm. We use multiple techniques to do per algorithm tuning to select the best parameters for each according characteristics of your data and the performance of the algorithm.
Using the data processed from all the steps above, we build many models to find the one that best fits your data. This is the "learning" part of machine learning.
We split your data into Train and Test sets and cross-validate the results to make sure we get a good fit with the data provided.
Finally, our platform packages the model up and deploys it to an API endpoint just for you including all the previous steps so that when the model is called it quickly generates results tuned from and for your dataset.