Quotation Eigenschink, Peter, Vamosi, Stefan, Vamosi, Ralf, Sun, Chang, Reutterer, Thomas, Kalcher, Klaudius. 2021. Deep Generative Models for Synthetic Data.




Growing interest in synthetic data has stimulated development and advancement of a large variety of deep generative models for a wide range of applications. However, as this research has progressed, its streams have become more specialized and disconnected from each other. For example, models for synthesizing text data for natural language processing cannot readily be compared to models for synthesizing health records. To mitigate this isolation, we propose a data-driven evaluation framework for generative models for synthetic data based on five high-level criteria: representativeness, novelty, realism, diversity and coherence of a synthetic data sample relative to the original data-set regardless of the models' internal structures. The criteria reflect requirements different domains impose on synthetic data and allow model users to assess the quality of synthetic data across models. In a critical review of generative models for sequential data, we examine and compare the importance of each performance criterion in numerous domains. For example, we find that realism and coherence are more important for synthetic data for natural language, speech and audio processing, while novelty and representativeness are more important for healthcare and mobility data. We also find that measurement of representativeness is often accomplished using statistical metrics, realism by using human judgement, and novelty using privacy tests.


Press 'enter' for creating the tag

Publication's profile

Status of publication Published
Affiliation WU
Type of publication Working/discussion paper, preprint
Language English
Title Deep Generative Models for Synthetic Data
Year 2021
URL https://epub.wu.ac.at/id/eprint/8394


Al-Based Privacy-Preserving Big Data Sharing for Market Research (ANITA-ANonymous bIg daTA)
Eigenschink, Peter (Details)
Vamosi, Stefan (Details)
Vamosi, Ralf (Former researcher)
Reutterer, Thomas (Details)
Kalcher, Klaudius (Mostly AI GmbH, Austria)
Sun, Chang (Maastricht University, Netherlands)
Institute for Marketing and Customer Analytics IN (Details)
Marketing DP (Details)
Research areas (Ă–STAT Classification 'Statistik Austria')
5255 Data security and data privacy (Details)
5320 Marketing (Details)
Google Scholar: Search