htslib icon indicating copy to clipboard operation
htslib copied to clipboard

bcftools --write-index creates sometimes indexes older than the data file.

Open ghuls opened this issue 1 year ago • 1 comments

bcftools --write-index creates sometimes indexes older than the data file.

# Create BCF files from VCF files and create index on the fly.
for i in $(seq 1 200); do
      sample=$(printf '%03d' "$i");
      echo ${sample};
     bcftools view --threads 2 --write-index -O b -o ${sample}.bcf ${sample}.vcf
done

# Merge all BCF files.
$  bcftools merge --threads 8 -O b --write-index -o merged.bcf *.bcf
[W::hts_idx_load3] The index file is older than the data file: 051.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 107.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 125.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 130.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 134.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 149.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 155.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 160.bcf.csi
[W::hts_idx_load3] The index file is older than the data file: 187.bcf.csi


# Modification times:
$ stat 051.bcf
  File: 051.bcf
  Size: 51212037        Blocks: 100032     IO Block: 4194304 regular file
Device: 25c66b0ah/633760522d    Inode: 144120644294937758  Links: 1
Access: (0644/-rw-r--r--)  Uid: (2530366/ghuls)   Gid: (2631836/   group)
Access: 2024-01-23 13:20:56.000000000 +0100
Modify: 2024-01-23 10:02:50.000000000 +0100
Change: 2024-01-23 10:02:50.000000000 +0100
 Birth: 2024-01-22 17:12:36.000000000 +0100

❯  stat 051.bcf.csi 
  File: 051.bcf.csi
  Size: 1379257         Blocks: 2696       IO Block: 4194304 regular file
Device: 25c66b0ah/633760522d    Inode: 144120644294937767  Links: 1
Access: (0644/-rw-r--r--)  Uid: (2530366/ghuls)   Gid: (2631836/   group)
Access: 2024-01-23 13:19:23.000000000 +0100
Modify: 2024-01-23 10:02:49.000000000 +0100
Change: 2024-01-23 10:02:49.000000000 +0100
 Birth: 2024-01-22 17:12:46.000000000 +0100

Could the index be "touched" (so last modification time is updated) after the BCF file is closed?

https://github.com/samtools/bcftools/blob/develop/vcfview.c#L825-L835

ghuls avatar Jan 23 '24 13:01 ghuls

This is probably a problem with HTSlib's API, so transferred here.

daviesrob avatar Jan 23 '24 14:01 daviesrob