Abschnittsübersicht

    • In the last section we learned about classification and regression models. These are models that observe the input data (features) along with the output data (labels). Their purpose is to model relationship between inputs and outputs for prediction. In this section we are looking at methods that focus on uncovering patterns in data. Their primary goal is not to make predictions but to help getting a better understanding of data and its patterns.

    • Clustering and Dimensionality Reduction

      Task
      • Familiarize yourself with the k-means algorithm by using the Interactive k-Means widget. What are the two basic steps this algorithm is performing? Note, that the algorithm relies on calculating distances between points. What is the purpose of this algorithm?
      • Build a data pipeline in Orange to apply the k-means clustering algorithm on the Airbnb data. Try to uncover patterns within the data. Use the Data Sampler to reduce the size of the data.
        Try finding cluster using two features only first. This makes it easy to verify the identified clusters.
      • Usually, the data has more than two or three dimensions (columns/features). This makes it challenging to visually show patterns. Make use of dimensionality reduction methods like principal component analysis (PCA) to project the data into two dimensions.
    • Exemplary solution to the above task

    • Exemplary solution to the above task

    • Embedding Images from Airbnb

      Task
      • Did you notice that the Airbnb data set contains URLs to images (see column picture_url)? Sample 20% of these images and download them using the Save Images widget from the Image Analytics add-on.
      • Use a (convolutional) neural network in Orange to embed the Airbnb accommodation images (Image Embedding). This basically turns any image into a point in a high-dimensional space. Then you can apply clustering methods and dimensionality reduction to gain insights about the images. You can view images using the Image Viewer widget.
    • Text Analysis or Natural Language Processing (NLP)

      Task
      • The Airbnb data provides also unstructured texts like reviews and description texts. Examine the relevant columns to see what kind of texts these columns contain.
      • To provide widgets for text analysis, you need to install the Text add-on. Similar to the Form TimeSeries, you have to turn the data into a Corpus for doing text analysis in Orange. Make sure to only take a small sample. Otherwise you might overload your computer.  Build a word cloud using the review texts and Preprocess Text to bring the text into shape where needed.
    • Word Cloud of Reviews

    • Exemplary solution to the above task

    • Task
      • Compare the review ratings with the reviews by doing a Sentiment Analysis. Try out different methods for doing sentiment analysis.
      • Try out other widgets from the Text Mining section like Topic Modeling.
    • Exemplary solution to the above task