dpv icon indicating copy to clipboard operation
dpv copied to clipboard

[FIX]: Remove unused CSV files from vocab_csv/

Open bact opened this issue 8 months ago • 2 comments

In code/vocab_csv, there are a number of CSV files that have no mentioned in the RDF/HTML generation code.

  1. Some of them are probably part of development in progress and the inclusion of them will soon be happen
  2. Some of them are part of outdated workflow/concepts that have moved to other CSV files

We like to keep (1) and may like to remove (2) to tidy up the codebase and avoid possible confusion.

  • If needed, files in (2) can still be accessible from the tag releases (e.g. dpv-2.1)

Files need to be reviewed

Listed below are CSV files that have no mentioned in the RDF/HTML generation code, with notes:

Filename Notes
Bias.csv Last commit 6 months ago
DE_glossary.csv Last commit last year; Has only a header; Maybe used for future translations
EntityControl.csv Last commit 10 months ago
Mapping_ODRL.csv Last commit last month
Requirement.csv Last commit last year
RiskSource.csv Last commit last month; Has comment "Proposed for v2.2"
Standards_ISO.csv Last commit last year
UseCase.csv Last commit 9 months ago
concepts.csv Last commit 9 months ago
legal-memberships.csv Empty; Replaced by location_memberships.csv?
legal-uk.csv Replaced by legal-gb.csv?
legal_Authorities.csv Replaced by legal-(countrycode).csv?
legal_EU_Adequacy.csv Replaced by legal-eu.csv?
legal_EU_EEA.csv Replaced by location_memberships.csv?
legal_Laws.csv Replaced by legal-(countrycode).csv?
legal_Locations.csv Replaced by location.csv?
legal_properties.csv Replaced by location_properties?
tech-data.csv Last commit last year
tech-ops.csv Last commit last year
tech-provision-properties.csv Last commit last year
tech-security.csv Last commit 11 months ago
tech-surveillance.csv Last commit last year
  • legal* files are likely to be replaced by other files and could be removed
  • The rest could still be in use

bact avatar Mar 22 '25 12:03 bact

A script the find CSV files with no mentioned in the codebase:

#!/bin/bash
# Run this script from the "code" directory.

CSV_DIR="vocab_csv"

csv_files=$(find "$CSV_DIR" -type f -name "*.csv")

unused_files=()

for csv_file in $csv_files; do
    file_name=$(basename "$csv_file")

    if ! grep --include="*.py" --include="*.sh" -r "$file_name" . > /dev/null; then
        unused_files+=("$file_name")
    fi
done

if [ ${#unused_files[@]} -eq 0 ]; then
    echo "All .csv files in '$CSV_DIR' are mentioned in the codebase."
else
    sorted_unused_files=($(printf "%s\n" "${unused_files[@]}" | sort))

    echo "The following .csv files in '$CSV_DIR' are not mentioned in the codebase:"
    for file in "${sorted_unused_files[@]}"; do
        echo "$file"
    done

    echo
    echo "From the list above, these files are empty or have only one line:"
    for file in "${sorted_unused_files[@]}"; do
        file_path="$CSV_DIR/$file"
        if [ ! -s "$file_path" ] || [ $(wc -l < "$file_path") -le 1 ]; then
            echo "$file"
        fi
    done
fi

bact avatar Mar 22 '25 12:03 bact

Hi @bact thanks - some of these files are present because they are part of DPV 1.0 or 2.0, therefore we usually keep them around in case fixes are needed or we want to see source of changed extensions. Some others are proposed work items, so they won't be included in the RDF/HTML generation scripts. Below I've made a note for how to resolve each file, but there is no issue with keeping them in the folder as they are helpful to look stuff up now and then. In the future, once we have resolved the proposed items, deleting all files in vocab_csv and downloading+extracting all CSVs again should fix this.

  • Bias.csv -- can be deleted
  • DE_glossary.csv -- needed for multilingual translations
  • EntityControl.csv -- delete
  • Mapping_ODRL.csv -- needed for ODRL-DPV mappings
  • Requirement.csv -- source for requirements
  • RiskSource.csv -- from RISK extension
  • Standards_ISO.csv -- proposed modelling of ISO standards
  • UseCase.csv -- source for use-cases
  • concepts.csv -- can be deleted
  • legal-memberships.csv -- can be deleted
  • legal-uk.csv -- can be deleted
  • legal_Authorities.csv -- can be deleted
  • legal_EU_Adequacy.csv -- can be deleted
  • legal_EU_EEA.csv -- can be deleted
  • legal_Laws.csv -- can be deleted
  • legal_Locations.csv -- can be deleted
  • legal_properties.csv -- can be deleted
  • tech-data.csv -- can be deleted
  • tech-ops.csv -- can be deleted
  • tech-provision-properties.csv -- can be deleted
  • tech-security.csv -- can be deleted
  • tech-surveillance.csv -- can be deleted

coolharsh55 avatar Mar 22 '25 16:03 coolharsh55