> For the complete documentation index, see [llms.txt](https://www.oja-guide.de/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://www.oja-guide.de/navigation/overview-methods.md).

# Overview - Methods

## Data Collection

* Landscaping [Data Collection](/steps/data-collection.md#data-sources-and-oja-landscaping)
* Web Scraping [Data Collection](/steps/data-collection.md#web-scraping)
* API Data Collection [Data Collection](/steps/data-collection.md#api-data-collection-data-providers)
* Data Formats [Data Collection](/steps/data-collection.md#job-posting-data-schema)

## Data Enrichment and Methods

* Text Segmentation [Data Enrichment](/steps/data-enrichment.md#text-segmentation)
* Duplicate Identification [Data Enrichment](/steps/data-enrichment.md#identifying-duplicates)
* Natural Language Processing and Data Pre-Processing [Extraction Methods](/steps/extraction-methods.md#pre-processing-and-embeddings)
* Rule-based Matching [Extraction Methods](/steps/extraction-methods.md#rule-based-matching)
* Supervised Document Classification [Extraction Methods](/steps/extraction-methods.md#supervised-classification)
* Named-Entity Recognition [Extraction Methods](/steps/extraction-methods.md#statistical-named-entity-recognition-token-classification)
* Disambiguation/ Entity Linking [Extraction Methods](/steps/extraction-methods.md#semantic-similarity)

## Evaluation and Quality Control

* Gold Standard Annotation [Evaluation and Quality Control](/steps/evaluation-and-quality-control.md#gold-standard-annotation-and-quality)
* Machine Learning Evaluation [Evaluation and Quality Control](/steps/evaluation-and-quality-control.md#evaluation)

## Taxonomies and Ontologies

* Taxonomy Development [Taxonomies and Ontologies](/steps/taxonomies-and-ontologies.md#developing-a-taxonomy)
* Taxonomy Evaluation [Taxonomies and Ontologies](/steps/taxonomies-and-ontologies.md#data-standards-for-taxonomies-and-ontologies)

## Dataset Curation and Representativity Analysis

* Sampling [Dataset Curation and Representativity Analysis](/steps/dataset-curation-and-representativity-analysis.md#filtering-data)
* Deduplication [Dataset Curation and Representativity Analysis](/steps/dataset-curation-and-representativity-analysis.md#deduplication)
* Representativity Analysis [Dataset Curation and Representativity Analysis](/steps/dataset-curation-and-representativity-analysis.md#representativity-analysis)
