◼️
Guide To German Online Job Ad Data
Impressum
  • A Guide to Collecting, Processing and Analyzing Online Job Ad Data
  • Navigation
    • Lifecycle
    • Overview - Challenges
    • Overview - Methods
  • Steps
    • Data Collection
    • Data Enrichment
    • Extraction Methods
    • Evaluation and Quality Control
    • Taxonomies and Ontologies
    • Dataset Curation and Representativity Analysis
  • In Practice
    • Literature and Projects
Powered by GitBook
On this page
  • Data Collection
  • Data Enrichment
  • Evaluation and Quality Control
  • Taxonomies and Ontologies
  • Dataset Curation and Representativity Analysis
Edit on GitLab
  1. Navigation

Overview - Challenges

This page gives an overview of common challenges you could encounter when analyzing OJAs. Next to each challenge is the relevant section in the guide.

PreviousLifecycleNextOverview - Methods

Last updated 1 year ago

If you have identified any challenges not listed here, please let us know by making a pull request in our GitHub Repository or contact us directly.

Data Collection

  • Where to find OJA data?

  • How to collect OJA data?

  • How to store OJA data?

Data Enrichment

  • How to segment a job ad?

  • How to identify duplicates?

  • How to extract occupations?

  • How to extract skills and competences?

Evaluation and Quality Control

  • How to evaluate extraction and classification algorithms?

  • How to create a gold standard for evaluation?

Taxonomies and Ontologies

Dataset Curation and Representativity Analysis

Which taxonomies are there (ISCO, ESCO, KLDB, etc.)?

How to develop a taxonomy?

How to evaluate a taxonomy?

How to deal with duplicates?

How to validate an OJA dataset/sample?

What to report?

Data Sources and OJA Landscaping
Web Scraping
Job Posting Data Schema
Reporting Results
Evaluation Metrics
Gold Standard Annotation and Quality
Deduplication
Representativity Analysis
#taxonomies-for-online-job-ad-analysis
Developing a Taxonomy
Data Standards for Taxonomies and Ontologies
Text Segmentation
Identifying Duplicates
#normalising-job-titles
Extracting Skills