

That way, you can create a custom UDF environment tailored to your needs. If you need further libraries not included in the default container or you want to use another version of Python, Exasol allows you to create your own script language containers including these. This language container includes some commonly used data science modules. Step 3: Preparing the EnvironmentĮxasol ships with a default language container for Python 3 that can be used to run Python UDFs out of the box in the database. Now that we've got an impression of the data, we will move on to the next step of preparing our environment for the creation of UDFs. # Close Exasol connection, we will open a new one in the next step The following table provides you with the details on connecting to a database (using an SQL client) and Python environment. Step 1: Connect to the Exasol Demo DatabaseĪs a first step, we will create a connection to the system and get familiar with the table structures. It contains multiple tables filled with training and test data. The schema we are working with is called " IDA". If you haven't already, sign up for an account on this page. Procedureįor this demo, we will use a dataset on the publicly available Exasol demo system. Everything can be done using UDFs directly inside Exasol where the data is stored. In the process, we demonstrate that there is no need to export data from Exasol to a different system for training and testing machine learning models. The purpose of the challenge was to predict, which failures were related to a specific component of a truck's air pressure system (APS) as opposed to failures unrelated to the APS. The data is publicly available in the IDA 2016 Challenge dataset from the Industrial Challenge at the 15th International Symposium on Intelligent Data Analysis (IDA) in 2016. We use real-world data provided by the truck manufacturer Scania to predict if truck failures are related to the failure of a specific component or not. Both approaches will be displayed side by side in the tutorial and in the below sections so that you can choose your preferred one.

Both approaches will give us the same results in the end, the advantage of the latter one is that it allows you to play around with dataframes inside Python and lets you visualize results more easily. The result of the demo can be achieved with two different approaches, either by writing the UDFs in a SQL client or creating them directly out of a Python environment with the help of the pyexasol package.
DBVISUALIZER MOVE RESULTS TO THE SIDE FULL
If you are interested in a full end-to-end demonstration of how the machine learning techniques can be directly applied in Exasol, refer to our Data Science GitHub repository. We will focus on the last steps, which is, running the model on the test data and evaluating its performance (steps 3 and 4 in the image below). In order to simplify the process for this demo, we assume that we have already developed an adequate model that predicts truck failures and trained it accordingly. First, you load the data, normalize the measures if necessary, build a model, train it, refine the parameters, and evaluate the performance. Model Testing with UDFsĭeveloping a typical machine learning model consists of multiple steps. Else sign up here to get access to the demo system hosted by Exasol for testing the data and scripts in this tutorial. Ensure you have access to the public demo system.
DBVISUALIZER MOVE RESULTS TO THE SIDE TRIAL
This tutorial is designed to work with the Free Trial System. This tutorial serves the purpose of trying out user-defined functions (UDFs) on an available machine learning model with a test dataset.

The demonstration does not require a deep understanding of data science or machine learning methods. The intended audience for the tutorial is assumed to have a basic understanding of Exasol, in particular, user-defined functions (UDFs) and basic Python programming knowledge.
