idr-metadata icon indicating copy to clipboard operation
idr-metadata copied to clipboard

idr0001-graml-sysgro to NGFF

Open will-moore opened this issue 2 years ago • 37 comments

idr0001 has 192 x 96-Well Plates, 6 acquisitions each. 1 Plate converted below is 47 GB. Approx 4.5 TB in total. bioformats2raw took ~30mins to convert 1 Plate => approx 4 days in total.

NB: The need to convert multi-acquisition Flex data (idr0001) is because the support for that hasn't been ported from IDR to mainline BioFormats: https://github.com/ome/bioformats/pull/3537

will-moore avatar Jan 08 '24 12:01 will-moore

convert first plate for testing... on pilot-zarr1-dev

~/bioformats2raw-0.6.0-24/bin/bioformats2raw "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/Meas_01(2012-07-31_10-41-12)/001001001.flex" JL_120731_S6A
...
2024-01-08 13:36:06,658 [main] WARN  loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,661 [main] WARN  loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,664 [main] WARN  loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,668 [main] WARN  loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,671 [main] WARN  loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
...

(at 2pm)...

Done about 14:15 (approx 45 mins).

will-moore avatar Jan 08 '24 13:01 will-moore

make bucket etc... then:

(base) [wmoore@pilot-zarr1-dev ~]$ ./mc cp -r /data/idr0001/JL_120731_S6A.ome.zarr uk1s3/idr0001/zarr
...0731_S6A.ome.zarr/OME/METADATA.ome.xml: 96.57 GiB / 96.57 GiB ━━━━━━━━━━━━━━ 48.81 MiB/s 33m45s

will-moore avatar Jan 08 '24 14:01 will-moore

Looks like we are getting duplicates of each of the 6 acquisitions for each Plate. We would expect each Well to have 6 Fields but we get 12 (6 pairs of duplicates): https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120731_S6A.ome.zarr/A/1/

Image

will-moore avatar Jan 08 '24 14:01 will-moore

There are a series of issues we might need to review here. For reference, the source data corresponding to the example above is https://ftp.ebi.ac.uk/pub/databases/IDR/idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/.

As can be seen above, there are 6 measurement folders, which are interpreted as plate acquisitions using the IDR Flex reader. Looking at the file listing for each measurement, there are 2 fields of views per well and plate acquisition.

001001001.flex
001001002.flex
001002001.flex
001002002.flex
...

In IDR, each well of each plate acquisition only contains 1 field per view and the fileset only includes the Flex files ending with 001.flex. I cannot comment know whether this is to be treated as an issue or an active decision made during the loading of this historical study /cc @joshmoore

If the intent is to create a Zarr dataset matching the current IDR representation, I think of two options:

  • either point bioformats2raw at a secondary structure where only the ...001.flex files are present
  • oruse omero-cli-zarr against live IDR

sbesson avatar Jan 09 '24 09:01 sbesson

Let's try omero-cli-zarr export for comparison... On idr-ftp... Update to latest https://github.com/ome/omero-cli-zarr/pull/147

conda activate omero_zarr_export
pip uninstall omero-cli-zarr

# cloned https://github.com/ome/omero-cli-zarr/pull/147 and merged https://github.com/ome/omero-cli-zarr/pull/156
$ cd /data/idr0001/omero_cli_zarr/omero-cli-zarr
$ pip install -e .

$ omero zarr export Plate:2552 --name_by name

...export status... 5 out of 96 Wells done in 35 mins. 7 mins per Well is 11 hours per Plate!

... completed at 22:08 - ~11.5 hours.

Upload...

./mc cp -r /data/idr0001/omero_cli_zarr/JL_120731_S6B.ome.zarr uk1s3/idr0001/zarr

Looks good at https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120731_S6B.ome.zarr/

will-moore avatar Jan 09 '24 10:01 will-moore

Tried exporting Polygons as labels... but this fails as the Polygons are overlapping:

(omero_zarr_export) [wmoore@pilot-zarr1-dev omero_cli_zarr]$ omero zarr polygons Plate:2552 --source-image=JL_120731_S6B.ome.zarr

Found 51 mask shapes in 51 ROIs
Unique dimensions: {'T': {None}, 'C': {None}, 'Z': {None}}
source_image JL_120731_S6B.ome.zarr/D/7/0
Ignoring dimensions {'C', 'T', 'Z'}
Traceback (most recent call last):
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/bin/omero", line 11, in <module>
    sys.exit(main())
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/cli.py", line 125, in _wrapper
    return func(self, *args, **kwargs)
  File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/cli.py", line 333, in polygons
    plate_shapes_to_zarr(plate, ["Polygon"], args)
  File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/masks.py", line 113, in plate_shapes_to_zarr
    saver.save(list(masks.values()), args.label_name)
  File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/masks.py", line 334, in save
    labels, fill_colors, properties = self.masks_to_labels(
  File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/masks.py", line 565, in masks_to_labels
    raise Exception(
Exception: Shape 624314 overlaps with existing labels

will-moore avatar Jan 10 '24 06:01 will-moore

Run mkngff on idr0125-pilot with

$ sudo mkdir /idr0001 && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0001 /idr0001
$ ls /idr0001/zarr/
JL_120731_S6A.ome.zarr  JL_120731_S6B.ome.zarr

as omero-server

omero mkngff sql 16452 --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120731_S6B.ome.zarr" "/idr0001/zarr/JL_120731_S6B.ome.zarr" > "idr0001/16452.sql"
$ psql -U omero -d idr -h $DBHOST -f idr0001/16452.sql 
UPDATE 576
BEGIN
 mkngff_fileset 
----------------
        5289241
(1 row)

COMMIT


$ omero mkngff symlink /data/OMERO/ManagedRepository 16452 "/idr0001/zarr/JL_120731_S6B.ome.zarr" --bfoptions
...
Creating symlink /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr -> /idr0001/zarr/JL_120731_S6B.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr.bfoptions

http://localhost:1040/webclient/?show=image-1230224 ...

Fails with:

2024-01-10 09:00:57,168 INFO  [                      omero.cmd.SessionI] (.Server-11) Unregistered servant:078c4a67-66a5-46f8-822f-8e196ea78f82/a88d46c1-8860-4398-b072-50d9939ab731omero.api.IQuery(omero.api._IQueryTie@cbaf0b18)
2024-01-10 09:00:57,168 INFO  [                      omero.cmd.SessionI] (.Server-11) Removed servant from adapter: a88d46c1-8860-4398-b072-50d9939ab731omero.api.IQuery
2024-01-10 09:01:05,954 DEBUG [                   loci.formats.Memoizer] (l.Server-0) start[1704877049382] time[216572] tag[loci.formats.Memoizer.setId]
2024-01-10 09:01:05,954 DEBUG [                   loci.formats.Memoizer] (.Server-10) start[1704876989274] time[276679] tag[loci.formats.Memoizer.setId]
2024-01-10 09:01:05,956 ERROR [         ome.io.bioformats.BfPixelBuffer] (l.Server-0) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr/.zattrs
2024-01-10 09:01:05,956 ERROR [         ome.io.bioformats.BfPixelBuffer] (.Server-10) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr/.zattrs
2024-01-10 09:01:05,957 ERROR [                ome.io.nio.PixelsService] (.Server-10) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr/.zattrs
java.lang.RuntimeException: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
        at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
        at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898)
        at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653)
        at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:571)
        at ome.services.RenderingBean$12.doWork(RenderingBean.java:2205)
        at jdk.internal.reflect.GeneratedMethodAccessor295.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)

will-moore avatar Jan 10 '24 08:01 will-moore

Error is raised at https://github.com/ome/omero-romio/blob/1d30fafc4e06c5511cfbb24c25a753925ffb2eb4/src/main/java/ome/io/bioformats/BfPixelBuffer.java#L79 but that's not where the ClassCastException comes from.

will-moore avatar Jan 10 '24 09:01 will-moore

Error is raised at https://github.com/ome/omero-romio/blob/1d30fafc4e06c5511cfbb24c25a753925ffb2eb4/src/main/java/ome/io/bioformats/BfPixelBuffer.java#L79 but that's not where the ClassCastException comes from.

I suspect the error comes from the underlying reader and is rethrown from the Bio-Formats pixel buffer. Is there more information in the following lines of the stack trace?

sbesson avatar Jan 10 '24 11:01 sbesson

Ah, yes...

Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')
        at loci.formats.in.ZarrReader.parsePlate(ZarrReader.java:755)
        at loci.formats.in.ZarrReader.initFile(ZarrReader.java:353)
        at loci.formats.FormatReader.setId(FormatReader.java:1443)
        at loci.formats.ImageReader.setId(ImageReader.java:849)
        at ome.io.nio.PixelsService$3.setId(PixelsService.java:869)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
        at loci.formats.ChannelFiller.setId(ChannelFiller.java:234)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
        at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:293)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
        at loci.formats.Memoizer.setId(Memoizer.java:690)
        at ome.io.bioformats.BfPixelsWrapper.<init>(BfPixelsWrapper.java:52)
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73)
        ... 82 common frames omitted

will-moore avatar Jan 10 '24 11:01 will-moore

That line is https://github.com/ome/ZarrReader/blob/main/src/loci/formats/in/ZarrReader.java#L755

String acqStartTime = (String) acquistion.get("starttime");

The schema states that this is an integer: https://github.com/ome/ngff/blob/main/0.4/schemas/plate.schema#L35

cc @dgault

will-moore avatar Jan 10 '24 11:01 will-moore

We want to create a copy of the original data without ...2.flex files but without duplicating all the data!

On pilot-zarr1-dev

export PLATE=JL_120801_S7A
cd /data/idr0001/raw

mkdir $PLATE
cd $PLATE
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
  mkdir "$i"
  for f in $(ls "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i"); do
    ln -s "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i/$f" "$i/$f"
  done
  rm "$i"/*2.flex
done

$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw "JL_120801_S7A/Meas_01(2012-08-01_10-57-32)/001001001.flex"  ngff/JL_120801_S7A.ome.zarr
...

Conversion took approx 30 mins.

(bioformats2raw) [wmoore@pilot-zarr1-dev ngff]$ ~/mc cp -r /data/idr0001/raw/ngff/JL_120801_S7A.ome.zarr uk1s3/idr0001/zarr
...7A.ome.zarr/OME/METADATA.ome.xml: 47.15 GiB / 47.15 GiB ━━━━━━━━━━━━━━━━ 49.82 MiB/s 16m9s

will-moore avatar Jan 11 '24 11:01 will-moore

omero mkngff sql 16454 --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120801_S7A.ome.zarr" "/idr0001/zarr/JL_120801_S7A.ome.zarr" > "idr0001/16454.sql"
$ psql -U omero -d idr -h $DBHOST -f idr0001/16454.sql 
UPDATE 576
BEGIN
 mkngff_fileset 
----------------
        5289242
(1 row)

COMMIT

$ omero mkngff symlink /data/OMERO/ManagedRepository 16454 "/idr0001/zarr/JL_120801_S7A.ome.zarr" --bfoptions
...
Creating dir at /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324_mkngff/JL_120801_S7A.ome.zarr -> /idr0001/zarr/JL_120801_S7A.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324_mkngff/JL_120801_S7A.ome.zarr.bfoptions

This Plate renders OK in webclient 👍 NB: it avoids the ZarrReader bug above since it doesn't have acquisition starttime or endtime metadata.

will-moore avatar Jan 11 '24 13:01 will-moore

Create symlinks...

cd /data/idr0001/raw

for PLATE in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/); do
echo $PLATE
mkdir $PLATE
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
  mkdir "$PLATE/$i"
  for f in $(ls "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i"); do
    ln -s "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i/$f" "$PLATE/$i/$f"
  done
  rm -f "$PLATE/$i"/*2.flex
done
done

Count flex files for each Plate/Acquisition in plates.csv (in IDR):

for PLATE in $(cat plates.csv); do
echo $PLATE >> counts3.log
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
 ls "$PLATE/$i" | wc >> counts3.log
done
done

cat counts3.log grep -v 96
# ignored Plates where all acquisitions have 96 files. That leaves...
JL_120801_S7B
     95      95    1425
JL_120814_S17A
     95      95    1425
JL_120816_S19B
     95      95    1425
JL_120907_S2A
     95      95    1425
JL_121214_J2_1
     95      95    1425
JL_121217_J7_1
     95      95    1425
JL_121219_J4_3
     95      95    1425
JL_121220_J5_3
     95      95    1425
JL_130126_J10_3
     93      93    1395
JL_130126_J10_4
     91      91    1365
JL_130126_J9_3
     94      94    1410
JL_130126_J9_4
     93      93    1395
JL_130127_J10_5
     94      94    1410
JL_130127_J9_5
     93      93    1395
JL_130127_J9_6
     94      94    1410
JL_130128_J9_7
     95      95    1425
JL_130128_J9_8
     95      95    1425
     95      95    1425
     95      95    1425
     95      95    1425
     95      95    1425
JL_130304_R1_3
     95      95    1425
X_110213_S2_Blue1000
     95      95    1425
X_110331_S13
     95      95    1425
X_110425_S6
     95      95    1425
X_110429_S24
     95      95    1425

will-moore avatar Jan 11 '24 14:01 will-moore

Checking for counts of ...2.flex files which I've "deleted" (removed symlinks) above. Most plates/acquisitions have 0 or 96 but some have other counts. Getting complex!

for PLATE in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/); do
echo $PLATE >> counts2.log
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
 ls "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i" | grep "2.flex" | wc >> counts2.log
done
done

will-moore avatar Jan 11 '24 15:01 will-moore

On pilot-zarr1-dev

$ cd /data/idr0001/raw
$ cat plates.csv
JL_120731_S6A
JL_120731_S6B
JL_120801_S7A
...
$ cat plates.csv | wc
    192     192    2828

for i in $(cat plates.csv); do
  first_flex=$(find $i -name *.flex -print -quit)
  echo $first_flex
  ~/bioformats2raw-0.6.0-24/bin/bioformats2raw "$first_flex"  ngff/$i.ome.zarr
done

will-moore avatar Jan 11 '24 16:01 will-moore

Try viewing some of the 22 Plates above which have fewer than 96 .flex files... Are these OK on IDR??

  • Plate JL_120801_S7B only has 3 Acquisitions in IDR and these images are viewable. It is the 4th Acquisition that has only 95 flex files

  • JL_120814_S17A appears to be all OK.

  • Plate JL_120816_S19B - The 2nd batch of 50 thumbnails doesn't load - No images viewable message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/24/03-19-34.711/Meas_01(2012-08-16_10-38-18)/001001001.flex

  • JL_120907_S2A - Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/24/18-09-54.256/Meas_01(2012-09-07_03-04-56)/001001001.flex

  • E.g. plate JL_130126_J10_3 https://idr.openmicroscopy.org/webclient/?show=well-522426 Try to view Preview tab:

    serverExceptionClass = ome.conditions.ResourceError
    message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-10/01/19-56-57.181/Meas_01(2013-01-26_16-42-53)/005004001.flex
}

will-moore avatar Jan 12 '24 09:01 will-moore

Since none of the Plates with missing .flex files above have successfully been converted by the script running above, let's test one on it's own (NB: this is the symlinked data with the *2.flex files removed):

screen -S idr0001_test:

$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw JL_120801_S7B/Meas_01\(2012-08-01_18-40-47\)/001001001.flex ngff/JL_120801_S7B.ome.zarr
...
2024-01-12 13:51:27,012 [main] WARN  loci.formats.FormatHandler - parsing /data/idr0001/raw/JL_120801_S7B/Meas_06(2012-08-02_01-08-20)/008010001.flex
2024-01-12 13:51:27,669 [main] WARN  loci.formats.FormatHandler - parsing /data/idr0001/raw/JL_120801_S7B/Meas_06(2012-08-02_01-08-20)/008011001.flex
2024-01-12 13:51:28,343 [main] WARN  loci.formats.FormatHandler - parsing /data/idr0001/raw/JL_120801_S7B/Meas_06(2012-08-02_01-08-20)/008012001.flex
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@16150369): java.lang.ArithmeticException: / by zero
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at picocli.CommandLine.call(CommandLine.java:2761)
        at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.ArithmeticException: / by zero
        at loci.formats.in.FlexReader.groupFiles(FlexReader.java:1402)
        at loci.formats.in.FlexReader.initFlexFile(FlexReader.java:581)
        at loci.formats.in.FlexReader.initFile(FlexReader.java:390)
        at loci.formats.FormatReader.setId(FormatReader.java:1389)
        at loci.formats.ImageReader.setId(ImageReader.java:849)
        at com.glencoesoftware.bioformats2raw.Converter.getBaseReaderClass(Converter.java:2032)
        at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:540)
        at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:516)
        at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:107)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        ... 9 more

Also try it on the original data where the number of 2.flex files for each acquisition matches the *1.flex file counts.

(bioformats2raw) [wmoore@pilot-zarr1-dev raw]$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/JL_120801_S7B/Meas_01\(2012-08-01_18-40-47\)/001001001.flex ngff/JL_120801_S7B.ome.zarr

This actually worked and has the correct number of Fields!

(base) [wmoore@pilot-zarr1-dev ngff]$ ls -alh !$
ls -alh JL_120801_S7B.ome.zarr/A/1
total 0
drwxrwxr-x.  8 wmoore wmoore  88 Jan 12 14:28 .
drwxrwxr-x. 14 wmoore wmoore 169 Jan 12 14:13 ..
drwxrwxr-x.  6 wmoore wmoore 100 Jan 12 14:12 0
drwxrwxr-x.  6 wmoore wmoore 100 Jan 12 14:15 1
drwxrwxr-x.  6 wmoore wmoore 100 Jan 12 14:18 2
drwxrwxr-x.  6 wmoore wmoore 100 Jan 12 14:21 3
drwxrwxr-x.  6 wmoore wmoore 100 Jan 12 14:24 4
drwxrwxr-x.  6 wmoore wmoore 100 Jan 12 14:28 5

Plate has the same number of images (.zattrs) as previous S7A plate!

$ find JL_120801_S7A.ome.zarr/ -name ".zattrs" | wc
    674     674   24906
$ find JL_120801_S7B.ome.zarr/ -name ".zattrs" | wc
    674     674   24906

will-moore avatar Jan 12 '24 13:01 will-moore

Another Plate that failed with symlinked data (without *2.flex files) - run against original data...

(bioformats2raw) [wmoore@pilot-zarr1-dev raw]$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/JL_120814_S17A/Meas_01\(2012-08-14_10-49-32\)/001001001.flex ngff/JL_120814_S17A.ome.zarr
...

will-moore avatar Jan 12 '24 16:01 will-moore

5 months later

Investigation of possible Bio-Formats fix at https://github.com/ome/bioformats/pull/3537 finds that this won't be easy/viable solution. So we need to revive the NGFF conversion work...

Summary from first read of history above:

  • using bioformats2raw-0.6.0-24 to convert requires creating plate structure that lacks ...2.flex files to avoid duplicating fields. But this has variable success depending on plate structure
  • using omero-cli-zarr takes 11 hours per plate (192 plates will take 3 months!!)

will-moore avatar Jun 24 '24 11:06 will-moore

If we need to use omero-cli-zarr, old idr is still available at ssh 45.88.81.175 etc...

will-moore avatar Jul 03 '24 15:07 will-moore

Would be good to convert all plates as above with bioformats2raw, and check which ones work with mkngff. Then possibly use omero-cli-zarr to export the ones that fail (hopefully smaller number) which is kinda slow.

Plates converted are ~47 GB so we only have space to convert 1 plate on each of pilot-zarr1-dev and pilot-zarr2-dev`.

(base) [wmoore@pilot-zarr1-dev ~]$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       100G   25G   76G  25% /
(base) [wmoore@pilot-zarr2-dev ~]$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       100G   20G   81G  20% /

will-moore avatar Jul 07 '24 12:07 will-moore

Plates converted are ~47 GB so we only have space to convert 1 plate on each of pilot-zarr1-dev and pilot-zarr2-dev`.

(base) [wmoore@pilot-zarr1-dev ~]$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       100G   25G   76G  25% /
(base) [wmoore@pilot-zarr2-dev ~]$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       100G   20G   81G  20% /

Converted data should never end up in the root partition. There is a dedidated /data partition on each of this node.

(base) [sbesson@pilot-zarr1-dev ~]$ df -h | grep /data
/dev/vdb                    4.9T  4.8T  112G  98% /data
(base) [sbesson@pilot-zarr2-dev ~]$ df -h | grep /data
/dev/vdb                    750G   89G  661G  12% /data

If 5TB is sufficient, let's review and discuss the cleanup the existing /data volume on these resources. Otherwise, let's define what the ideal requirements in terms of compute and storage and create a new pilot.

sbesson avatar Jul 07 '24 13:07 sbesson

Thanks, @sbesson. @will-moore, were you looking to convert e.g. all of idr0001 at once rather than, e.g., moving each plate off to S3?

joshmoore avatar Jul 08 '24 14:07 joshmoore

Need env for omero cli zarr... Want to install conda etc... First need to install wget!

$ ssh -A 45.88.81.175 -L 1080:localhost:80
Last login: Mon Aug 19 10:45:01 2024 from 82.132.231.200

NB: webclient is broken e.g. http://localhost:1080/webclient/

  File "/opt/omero/web/venv3/lib64/python3.9/site-packages/omeroweb/webclient/views.py", line 490, in _load_template
    active_group = request.session.get("active_group") or conn.getEventContext().groupId

AttributeError: 'NoneType' object has no attribute 'getEventContext'

Try to use omero-cli-zarr...

[wmoore@prod121-proxy ~]$ sudo yum install wget

[wmoore@prod121-proxy ~]$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

$ ~/miniconda3/bin/conda init bash
$ source .bashrc

$ conda create -n omeropy python=3.9 conda-forge::zeroc-ice==3.6.5 omero-py

We want to use https://github.com/ome/omero-cli-zarr/pull/147 (not merged yet) so we need to checkout that branch etc. Need to install git!

(omeropy) [wmoore@prod121-proxy idr0001]$ sudo yum install git

Merged https://github.com/ome/omero-cli-zarr/pull/147 with origin to fix scm version issue...

14:01...

omero zarr export Plate:2551
...

Completed ~ 10:30 pm - 8.5 hours for a Plate...

Rename plate, since I didn't use --name_by name during export... and upload to s3...

$ mv 2551.ome.zarr JL_120731_S6A.ome.zarr

$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0101A/

(base) [wmoore@prod121-proxy idr0001]$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0101A/
.../JL_120731_S6A.ome.zarr/H/9/5/4/1/9/0/0: 32.53 GiB / 32.53 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 37.98 MiB/s 14m36s

EDIT... 29th August... Ooops! typo in the upload "idr0101A" -> "idr0001A".

(base) [wmoore@prod121-proxy idr0001]$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0001A/
...0001/JL_120731_S6A.ome.zarr.zip: 24.53 GiB / 24.53 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 134.05 MiB/s 3m7s

Ahhh! - typo again!!! JL_120731_S6A.ome.zarr.zip because I have deleted local unzipped data! Unzip... Upload AGAIN!

$ unzip JL_120731_S6A.ome.zarr.zip

$  /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0001A/
...31_S6A.ome.zarr/H/9/5/4/1/9/0/0: 57.06 GiB / 57.06 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 89.84 MiB/s 10m50s
$ df -h ./
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda5        79G   64G   16G  81% /

Cleanup:

time /home/wmoore/mc rm --recursive --force uk1s3/idr/zarr/v0.4/idr0101A/JL_120731_S6A.ome.zarr
/home/wmoore/mc rm --force uk1s3/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr.zip

will-moore avatar Aug 19 '24 13:08 will-moore

Install goofys and mount bia-integrator-data and idr buckets...

$ wget https://github.com/kahing/goofys/releases/latest/download/goofys
$ chmod +x goofys

# didn't need this yet!
$ sudo mkdir /bia-integrator-data && sudo ./goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data

$ sudo mkdir /uk1s3_idr && sudo ./goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr /uk1s3_idr

(base) [wmoore@prod121-proxy ~]$ ls /uk1s3_idr/zarr/v0.4/idr0001A
2551.zarr  JL_120731_S6A.ome.zarr 

All good!

will-moore avatar Aug 29 '24 11:08 will-moore

mkngff...

conda activate omero-py
pip install 'omero-mkngff @ git+https://github.com/joshmoore/omero-mkngff@main'

First plate: Fileset 16451

$ omero login
$ time omero mkngff sql 16451 --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr" "/uk1s3_idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr" > "idr0001/16451.sql"

On pilot-idrngff... Following https://github.com/IDR/mkngff_upgrade_scripts

(venv3) (base) [wmoore@pilot-idrngff-omeroreadwrite ~]$ sudo mkdir /uk1s3_idr && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr /uk1s3_idr

$ psql -U omero -d idr -h $DBHOST -f 16451.sql 
UPDATE 576
BEGIN
 mkngff_fileset 
----------------
        6320797
(1 row)

COMMIT


$ omero login   (as demo user)
$ omero mkngff symlink /data/OMERO/ManagedRepository 16451 "/uk1s3_idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr" --bfoptions --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr"

Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579
Creating dir at /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr -> /uk1s3_idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr.bfoptions

View image... 10-15 mins...

Traceback (most recent call last):

  File "/opt/omero/web/venv3/lib64/python3.9/site-packages/omero_api_RenderingEngine_ice.py", line 1192, in load
    return _M_omero.api.RenderingEngine._op_load.invoke(self, ((), _ctx))

omero.ResourceError: exception ::omero::ResourceError
{
    serverStackTrace = ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
	at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:907)
	at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653)

    serverExceptionClass = ome.conditions.ResourceError
    message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
}

<WSGIRequest: GET '/webclient/metadata_preview/well/590809/?_=1724937458855'>

Blitz.log

2024-08-29 13:28:52,846 DEBUG [                   loci.formats.Memoizer] (.Server-10) start[1724937495656] time[637189] tag[loci.formats.Memoizer.setId]
2024-08-29 13:28:52,847 ERROR [         ome.io.bioformats.BfPixelBuffer] (.Server-10) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
2024-08-29 13:28:52,848 ERROR [                ome.io.nio.PixelsService] (.Server-10) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
java.lang.RuntimeException: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
        at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
        at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898)
        at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653)
        at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:571)
        at ome.services.RenderingBean$12.doWork(RenderingBean.java:2205)
        at jdk.internal.reflect.GeneratedMethodAccessor317.invoke(Unknown Source)
...
Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
        at loci.formats.in.ZarrReader.parsePlate(ZarrReader.java:764)
        at loci.formats.in.ZarrReader.initFile(ZarrReader.java:361)
        at loci.formats.FormatReader.setId(FormatReader.java:1480)
        at loci.formats.ImageReader.setId(ImageReader.java:864)
        at ome.io.nio.PixelsService$3.setId(PixelsService.java:869)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
        at loci.formats.ChannelFiller.setId(ChannelFiller.java:258)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
        at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:317)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
        at loci.formats.Memoizer.setId(Memoizer.java:726)
        at ome.io.bioformats.BfPixelsWrapper.<init>(BfPixelsWrapper.java:52)
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73)

This line of ZarrReader:

          Integer acqStartTime = (Integer) acquistion.get("starttime");

will-moore avatar Aug 29 '24 13:08 will-moore

Edited the plate /.zattrs file above to remove starttime attrs and replaced on s3:

$ /home/wmoore/mc cp JL_120731_S6A.ome.zarr/.zattrs uk1s3/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr/
...e/idr0001/JL_120731_S6A.ome.zarr/.zattrs: 13.99 KiB / 13.99 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓┃ 32.17 KiB/s 0s

Try to delete memo file on pilot-idrngff and re-create... Don't see any memo file at:

bash-5.1$ ls -alh !$
ls -alh /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr
total 0
2024-08-30 10:28:11,565 DEBUG [                   loci.formats.Memoizer] (.Server-20) saved to temp file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/..zattrs.bfmemo14210388543534098629
2024-08-30 10:28:11,566 DEBUG [                   loci.formats.Memoizer] (.Server-20) start[1725013691240] time[326] tag[loci.formats.Memoizer.saveMemo]
2024-08-30 10:28:11,566 DEBUG [                   loci.formats.Memoizer] (.Server-20) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/..zattrs.bfmemo (1318254 bytes)
2024-08-30 10:28:11,566 DEBUG [                   loci.formats.Memoizer] (.Server-20) start[1725012959020] time[732546] tag[loci.formats.Memoizer.setId]

732546 ms is 12 minutes.

We can now view images for all acquisitions... BUT Wells are in the wrong place:

With new thumbnails generated by saving rendering settings, we see: The first 6 images (A1 - A6) are actually the 6 Fields from A1. The next 6 images (A7 - A12) are the 6 Fields from A2.

Screenshot 2024-08-30 at 12 04 03

The next 6 images (B1 - B6) correspond to the 6 Fields from A3 (as seen in vizarr):

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr/A/3

Screenshot 2024-08-30 at 12 08 22

will-moore avatar Aug 30 '24 11:08 will-moore

NB: few plates uploaded above:

$ aws --profile embassy s3 ls idr0001/zarr/
                           PRE JL_120731_S6A.ome.zarr/
                           PRE JL_120731_S6B.ome.zarr/
                           PRE JL_120801_S7A.ome.zarr/

See https://github.com/IDR/idr-metadata/issues/683#issuecomment-1887129269 "Plate renders OK in webclient" - but did it have the Well/acquisition layout issues in previous comment above?? https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120801_S7A.ome.zarr/

TODO: try mkngff with that plate

will-moore avatar Sep 02 '24 10:09 will-moore

@sbesson pointed out that pilot-idrngff doesn't have latest ZarrReader. Try to update and check again... (delete memo etc)

will-moore avatar Sep 02 '24 10:09 will-moore