Research Highlights

 

Pantheon: Efficiently Creating and Maintaining Semantically Meaningful Entity Rankings at Large-Scale: Our efforts within the DFG-funded Pantheon project (MI 1794/1-1) aim at developing methods that allow generating and maintaining semantically meaningful entity rankings. For instance, the top-10 of the tallest buildings in Europe or the top-20 universities with respect to the number of their Nobel laureates; using information sources such as knowledge bases.  Since the beginning of the project, we have achieved notable research results, published at premier international conferences and workshops, as well as several Bachelor's and Master's thesis at Saarland University and at TU Kaiserslautern.   Read more: Pantheon (DFG Project)

 

Mining Interesting Categorical Attributes:

Given a table that contains data objects and their attributes, in this research direction, we aim at determining categorical attributes (i.e., columns of the table) that can be used to categorize the objects, thus, providing more focused and comprehensible information to users. A sample table is shown below on the left. Not all attributes are equally likely to be useful. While country might be a good category to group skyscrapers, the name of the architects (not shown in the sample below) might not be so interesting (depending on the number and distribution of architects and their buildings). In order to decide this automatically, sufficient training data and appropriate features need to be derived. We started with providing the hypothesis that training data can be derived from Wikipedia based on the presence or absence of specific tables and extended the set of available features like Entropy by three additional ones that captures detailed characteristics of our main objective. Below on the right-hand side, three example tables represented by their frequency distribution of attribute values are shown. Below the tables, we see the used features that describe the distributions (details are in the SSDBM'18 paper). Using the SVM approach, we trained a classifier that was proven by a user study to be able to tell apart interesting from non-interesting attributes. Training data and extracted features are available under dbis.informatik.uni-kl.de/catmining. This research is part of the DFG-funded PANTHEON project, where finding useful categories to group (ranked) entities is an essential ingredient to automatically construct interesting entity rankings.

 

PALEO: Reverse Engineering of OLAP Queries: PALEO is a novel framework to reverse engineer top-k OLAP queries. The underlying research challenges are immense, given the various dimensions of the search space, the potentially very large base relation, and the small input snippet in form of a top-k list. PALEO mainly operates on a subset of the base relation, held in memory, and further uses data samples, histograms, and simple descriptive statistics to identify potentially valid queries (that generate the input list). PALEO comprises a probabilistic model that evaluates the suitability of a query discovered over a subset of R', methodology that is directly applicable to the case of handling variations of R and considering partial match queries, i.e., queries that only approximately match the input list. PALEO is part of the DFG-funded PANTHEON project, for its use of top-k entity rankings to foster data exploration.

 

CLASH: A high-level abstraction for enabling optimized multi-way joins over Apache Storm: CLASH is a high-level abstraction on top of common scale-out stream processors for enabling declarative query formulation and optimization. CLASH treats a wide range of multi-way joins  as first-class citizens, enabled through a novel stream-join operator that allows for massive flexibility in tuple routing, at any point in the query plan. This flexibility allows trading-off bandwidth consumption and overhead for materializing intermediate results. CLASH is optimizing and translating (join) queries into topologies that natively run in Apache Storm, using its routing primitives benefiting from desired properties like fault tolerance, efficiency, and maturity.
Query optimization in CLASH is based on a versatile cost-model and join-plan enumeration using dynamic programming.

 

 

 

 

Publications

 

2018

  1. Koninika Pal and Sebastian Michel: Learning Interesting Attributes for Automated Data Categorization. 30th International Conference on Scientific and Statistical Database Management (SSDBM). Bolzano-Bozen, Italy, July 9-11, 2018.

  2. Evica Milchevski, Fabian Neffgen, Sebastian Michel. Processing Class-Constraint K-NN Queries with MISP.  21th International Workshop on the Web and Databases (WebDB 2018),  Houston, TX, USA, co-located with SIGMOD. pdf

  3. Kiril Panev and Sebastian Michel: Exploring Pros and Cons of Ranked Entities with COMPETE. 5th International Workshop on Exploratory Search in Databases and the Web (ExploreDB), Co-located with SIGMOD/PODS 2018, Houston, TX, USA, June 15th, 2018. pdf

  4. Jessica A. de Souza, Agma J. M. Traina, Sebastian Michel. Class-Constraint Similarity Queries. The 33rd ACM/SIGAPP Symposium On Applied Computing (SAC), Pau, France, April, 2018. pdf (copyright ACM)

2017

  1. Koninika Pal and Sebastian Michel. Learning Interesting Categorical Attributes for Refined Data Exploration. CoRR, abs/1711.10933, 2017. link

  2. Koninika Pal and Sebastian Michel. LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists.
    20th International Workshop on the Web and Databases (WebDB 2017), Chicago, IL, USA, 2017, co-located with SIGMOD. pdf

  3. Manuel Hoffmann and Sebastian Michel. Scaling Out Continuous Multi-Way Theta Joins.
    Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR), Chicago, IL, USA, 2017, co-located with SIGMOD/PODS.  pdf

  4. Kiril Panev, Sebastian Michel, Evica Milchevski, Koninika Pal. Exploring Databases via Reverse Engineering Ranking Queries with PALEO. Datenbanksysteme für Business, Technologie und Web (BTW 2017), Stuttgart, Germany. Invited Demonstration.

  5. Kiril Panev, Nico Weisenauer, Sebastian Michel.
    Reverse Engineering Top-k Join Queries. Datenbanksysteme für Business, Technologie und Web (BTW 2017), Stuttgart, Germany.

2016

  1. Koninika Pal and Sebastian Michel. Efficient Similarity Search across Top-k Lists under the Kendall's Tau Distance. Conference on Scientific and Statistical Database Management (SSDBM),  Budapest, Hungary, 2016. pdf

  2. Manuel Hoffmann, Evica Milchevski, Sebastian Michel. Playing LEGO with JSON: Probabilistic Joins over Attribute-Value Fragments. 4th International Workshop on Keyword Search and Data Exploration on Structured Data (KEYS), co-located with ICDE, Helsinki, Finland, 2016. pdf (copyright IEEE)

  3. Koninika Pal, Sebastian Michel. A Data Mining Approach to Choosing Categorical Attributes for Ranked Lists. 19th International Conference on Extending Database Technology (EDBT),  Bordeaux, France, March 2016. Poster track.

  4. Evica Milchevski, Sebastian Michel. Quantifying Likelihood of Change through Update Propagation across Top-k Rankings. 19th International Conference on Extending Database Technology (EDBT),  Bordeaux, France, March 2016. Poster track.

  5. Kiril Panev, Sebastian Michel. Reverse Engineering Top-k Database Queries with PALEO. 19th International Conference on Extending Database Technology (EDBT),  Bordeaux, France, March 2016.  [pdf]

  6. Kiril Panev, Evica Milchevski, Sebastian Michel. Computing Similar Entity Rankings via Reverse Engineering of Top-k Database Queries. 4th International Workshop on Keyword Search and Data Exploration on Structured Data (KEYS), co-located with ICDE, Helsinki, Finland, 2016. pdf (copyright IEEE)

  7. Fabian Reinartz, Koninika Pal, Sebastian Michel. Mining Entity Rankings. Datenbankspektrum, 2016. DOI:10.1007/s13222-015-0205-2 

2015

  1. Evica Milchevski, Avishek Anand, Sebastian Michel. The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K-List Similarity Search. 18th International Conference on Extending Database Technology (EDBT), Brussels, Belgium, 2015. [slides]

  2. Evica Milchevski, Sebastian Michel. ligDB - Online Query Processing Without (almost) any Storage. 18th International Conference on Extending Database Technology (EDBT), Brussels, Belgium, 2015. [slides]

2014

  1. Kiril Panev, Klaus Berberich. Phrase queries with inverted + direct indexes. In Web Information Systems Engineering. 15th International Conference on Web Information System Engineering (WISE) , Thessaloniki, Greece, October 12-14, 2014, Proceedings, Part I, pages 156–169, 2014.

  2. Foteini Alvanaki, Sebastian Michel. Tracking Set Correlations at Large Scale. SIGMOD 2014. Snowbird, Utah, USA

  3. Koninika Pal, Sebastian Michel. An LSH Index for Computing Kendall's Tau over Top-k Lists. 17th International workshop on the Web and Databases(WebDB 2014) co-located with ACM SIGMOD 2014, Utah, USA, 2014.

  4. Johannes Schildgen, Thomas Jörg, Manuel Hoffmann, Stefan Dessloch: Marimba: A Framework for Making MapReduce Jobs Incremental. IEEE International Congress on Big Data, Anchorage, AK, USA, June 27 - July 2, 2014.

 

2013

  1. Evica Ilieva, Aleksandar Stupar, Sebastian Michel. The Essence of Knowledge (Bases) through Entity Rankings. 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, USA, 2013. Poster track.

  2. Foteini Alvanaki, Evica Ilieva, Sebastian Michel, Aleksandar Stupar. Interesting Event Detection through Hall of Fame Rankings. Third ACM SIGMOD Workshop on Databases and Social Networks (DBSOCIAL 2013), in conjunction with SIGMOD 2013, New York, NY, USA.

  3. Foteini Alvanaki, Sebastian Michel. A Thin Monitoring Layer for Top-k Aggregation Queries over a Database. Third ACM SIGMOD Workshop on Databases and Social Networks (DBRank 2013), in conjunction with VLDB 2013, Riva del Garda, Trento, Italy. pdf

  4. Aleksandar Stupar, Sebastian Michel. SRbench-A Benchmark for Soundtrack Recommendation Systems. 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013).

  5. Evica Ilieva, Aleksandar Stupar, Sebastian Michel. The Essence of Knowledge (Bases) through Entity Rankings. 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013). Poster track.

  6. Foteini Alvanaki, Sebastian Michel. Scalable, Continuous Tracking of Tag Co-Occurrences between Short Sets using (Almost) Disjoint Tag Partitions. Best Student Paper Award Third ACM SIGMOD Workshop on Databases and Social Networks (DBSOCIAL 2013), in conjunction with SIGMOD 2013, New York, NY, USA.

  7. Foteini Alvanaki, Evica Ilieva, Sebastian Michel, Aleksandar Stupar. Interesting Event Detection through Hall of Fame Rankings. Third ACM SIGMOD Workshop on Databases and Social Networks (DBSOCIAL 2013), in conjunction with SIGMOD 2013, New York, NY, USA.

  8. Aleksandar Stupar, Sebastian Michel. Automated Educated Guessing. 4th International Workshop on Data Engineering meets the Semantic Web (DESWEB 2013), in conjunction with ICDE 2013, Brisbane, Australia.

 

 

 

 

AG DBIS, TU Kaiserslautern

Datenschutzerklärung/Data Privacy Statement