idr0001-graml-sysgro to NGFF
idr0001 has 192 x 96-Well Plates, 6 acquisitions each. 1 Plate converted below is 47 GB. Approx 4.5 TB in total. bioformats2raw took ~30mins to convert 1 Plate => approx 4 days in total.
NB: The need to convert multi-acquisition Flex data (idr0001) is because the support for that hasn't been ported from IDR to mainline BioFormats: https://github.com/ome/bioformats/pull/3537
convert first plate for testing... on pilot-zarr1-dev
~/bioformats2raw-0.6.0-24/bin/bioformats2raw "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/Meas_01(2012-07-31_10-41-12)/001001001.flex" JL_120731_S6A
...
2024-01-08 13:36:06,658 [main] WARN loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,661 [main] WARN loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,664 [main] WARN loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,668 [main] WARN loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
2024-01-08 13:36:06,671 [main] WARN loci.formats.FormatHandler - mismatch between image count, names and factors (count=36864, names=32, factors=32)
...
(at 2pm)...
Done about 14:15 (approx 45 mins).
make bucket etc... then:
(base) [wmoore@pilot-zarr1-dev ~]$ ./mc cp -r /data/idr0001/JL_120731_S6A.ome.zarr uk1s3/idr0001/zarr
...0731_S6A.ome.zarr/OME/METADATA.ome.xml: 96.57 GiB / 96.57 GiB ━━━━━━━━━━━━━━ 48.81 MiB/s 33m45s
Looks like we are getting duplicates of each of the 6 acquisitions for each Plate. We would expect each Well to have 6 Fields but we get 12 (6 pairs of duplicates): https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120731_S6A.ome.zarr/A/1/
There are a series of issues we might need to review here. For reference, the source data corresponding to the example above is https://ftp.ebi.ac.uk/pub/databases/IDR/idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/.
As can be seen above, there are 6 measurement folders, which are interpreted as plate acquisitions using the IDR Flex reader. Looking at the file listing for each measurement, there are 2 fields of views per well and plate acquisition.
001001001.flex
001001002.flex
001002001.flex
001002002.flex
...
In IDR, each well of each plate acquisition only contains 1 field per view and the fileset only includes the Flex files ending with 001.flex. I cannot comment know whether this is to be treated as an issue or an active decision made during the loading of this historical study /cc @joshmoore
If the intent is to create a Zarr dataset matching the current IDR representation, I think of two options:
- either point
bioformats2rawat a secondary structure where only the...001.flexfiles are present - oruse
omero-cli-zarragainst live IDR
Let's try omero-cli-zarr export for comparison...
On idr-ftp...
Update to latest https://github.com/ome/omero-cli-zarr/pull/147
conda activate omero_zarr_export
pip uninstall omero-cli-zarr
# cloned https://github.com/ome/omero-cli-zarr/pull/147 and merged https://github.com/ome/omero-cli-zarr/pull/156
$ cd /data/idr0001/omero_cli_zarr/omero-cli-zarr
$ pip install -e .
$ omero zarr export Plate:2552 --name_by name
...export status... 5 out of 96 Wells done in 35 mins. 7 mins per Well is 11 hours per Plate!
... completed at 22:08 - ~11.5 hours.
Upload...
./mc cp -r /data/idr0001/omero_cli_zarr/JL_120731_S6B.ome.zarr uk1s3/idr0001/zarr
Looks good at https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120731_S6B.ome.zarr/
Tried exporting Polygons as labels... but this fails as the Polygons are overlapping:
(omero_zarr_export) [wmoore@pilot-zarr1-dev omero_cli_zarr]$ omero zarr polygons Plate:2552 --source-image=JL_120731_S6B.ome.zarr
Found 51 mask shapes in 51 ROIs
Unique dimensions: {'T': {None}, 'C': {None}, 'Z': {None}}
source_image JL_120731_S6B.ome.zarr/D/7/0
Ignoring dimensions {'C', 'T', 'Z'}
Traceback (most recent call last):
File "/home/wmoore/miniconda3/envs/omero_zarr_export/bin/omero", line 11, in <module>
sys.exit(main())
File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/main.py", line 125, in main
rv = omero.cli.argv()
File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1784, in argv
cli.invoke(args[1:])
File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1222, in invoke
stop = self.onecmd(line, previous_args)
File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1299, in onecmd
self.execute(line, previous_args)
File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1381, in execute
args.func(args)
File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/cli.py", line 125, in _wrapper
return func(self, *args, **kwargs)
File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/cli.py", line 333, in polygons
plate_shapes_to_zarr(plate, ["Polygon"], args)
File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/masks.py", line 113, in plate_shapes_to_zarr
saver.save(list(masks.values()), args.label_name)
File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/masks.py", line 334, in save
labels, fill_colors, properties = self.masks_to_labels(
File "/data/idr0001/omero_cli_zarr/omero-cli-zarr/src/omero_zarr/masks.py", line 565, in masks_to_labels
raise Exception(
Exception: Shape 624314 overlaps with existing labels
Run mkngff on idr0125-pilot with
$ sudo mkdir /idr0001 && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0001 /idr0001
$ ls /idr0001/zarr/
JL_120731_S6A.ome.zarr JL_120731_S6B.ome.zarr
as omero-server
omero mkngff sql 16452 --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120731_S6B.ome.zarr" "/idr0001/zarr/JL_120731_S6B.ome.zarr" > "idr0001/16452.sql"
$ psql -U omero -d idr -h $DBHOST -f idr0001/16452.sql
UPDATE 576
BEGIN
mkngff_fileset
----------------
5289241
(1 row)
COMMIT
$ omero mkngff symlink /data/OMERO/ManagedRepository 16452 "/idr0001/zarr/JL_120731_S6B.ome.zarr" --bfoptions
...
Creating symlink /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr -> /idr0001/zarr/JL_120731_S6B.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr.bfoptions
http://localhost:1040/webclient/?show=image-1230224 ...
Fails with:
2024-01-10 09:00:57,168 INFO [ omero.cmd.SessionI] (.Server-11) Unregistered servant:078c4a67-66a5-46f8-822f-8e196ea78f82/a88d46c1-8860-4398-b072-50d9939ab731omero.api.IQuery(omero.api._IQueryTie@cbaf0b18)
2024-01-10 09:00:57,168 INFO [ omero.cmd.SessionI] (.Server-11) Removed servant from adapter: a88d46c1-8860-4398-b072-50d9939ab731omero.api.IQuery
2024-01-10 09:01:05,954 DEBUG [ loci.formats.Memoizer] (l.Server-0) start[1704877049382] time[216572] tag[loci.formats.Memoizer.setId]
2024-01-10 09:01:05,954 DEBUG [ loci.formats.Memoizer] (.Server-10) start[1704876989274] time[276679] tag[loci.formats.Memoizer.setId]
2024-01-10 09:01:05,956 ERROR [ ome.io.bioformats.BfPixelBuffer] (l.Server-0) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr/.zattrs
2024-01-10 09:01:05,956 ERROR [ ome.io.bioformats.BfPixelBuffer] (.Server-10) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr/.zattrs
2024-01-10 09:01:05,957 ERROR [ ome.io.nio.PixelsService] (.Server-10) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-19.109_mkngff/JL_120731_S6B.ome.zarr/.zattrs
java.lang.RuntimeException: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')
at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898)
at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653)
at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:571)
at ome.services.RenderingBean$12.doWork(RenderingBean.java:2205)
at jdk.internal.reflect.GeneratedMethodAccessor295.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
Error is raised at https://github.com/ome/omero-romio/blob/1d30fafc4e06c5511cfbb24c25a753925ffb2eb4/src/main/java/ome/io/bioformats/BfPixelBuffer.java#L79
but that's not where the ClassCastException comes from.
Error is raised at https://github.com/ome/omero-romio/blob/1d30fafc4e06c5511cfbb24c25a753925ffb2eb4/src/main/java/ome/io/bioformats/BfPixelBuffer.java#L79 but that's not where the ClassCastException comes from.
I suspect the error comes from the underlying reader and is rethrown from the Bio-Formats pixel buffer. Is there more information in the following lines of the stack trace?
Ah, yes...
Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')
at loci.formats.in.ZarrReader.parsePlate(ZarrReader.java:755)
at loci.formats.in.ZarrReader.initFile(ZarrReader.java:353)
at loci.formats.FormatReader.setId(FormatReader.java:1443)
at loci.formats.ImageReader.setId(ImageReader.java:849)
at ome.io.nio.PixelsService$3.setId(PixelsService.java:869)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
at loci.formats.ChannelFiller.setId(ChannelFiller.java:234)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:293)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
at loci.formats.Memoizer.setId(Memoizer.java:690)
at ome.io.bioformats.BfPixelsWrapper.<init>(BfPixelsWrapper.java:52)
at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73)
... 82 common frames omitted
That line is https://github.com/ome/ZarrReader/blob/main/src/loci/formats/in/ZarrReader.java#L755
String acqStartTime = (String) acquistion.get("starttime");
The schema states that this is an integer: https://github.com/ome/ngff/blob/main/0.4/schemas/plate.schema#L35
cc @dgault
We want to create a copy of the original data without ...2.flex files but without duplicating all the data!
On pilot-zarr1-dev
export PLATE=JL_120801_S7A
cd /data/idr0001/raw
mkdir $PLATE
cd $PLATE
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
mkdir "$i"
for f in $(ls "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i"); do
ln -s "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i/$f" "$i/$f"
done
rm "$i"/*2.flex
done
$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw "JL_120801_S7A/Meas_01(2012-08-01_10-57-32)/001001001.flex" ngff/JL_120801_S7A.ome.zarr
...
Conversion took approx 30 mins.
(bioformats2raw) [wmoore@pilot-zarr1-dev ngff]$ ~/mc cp -r /data/idr0001/raw/ngff/JL_120801_S7A.ome.zarr uk1s3/idr0001/zarr
...7A.ome.zarr/OME/METADATA.ome.xml: 47.15 GiB / 47.15 GiB ━━━━━━━━━━━━━━━━ 49.82 MiB/s 16m9s
omero mkngff sql 16454 --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120801_S7A.ome.zarr" "/idr0001/zarr/JL_120801_S7A.ome.zarr" > "idr0001/16454.sql"
$ psql -U omero -d idr -h $DBHOST -f idr0001/16454.sql
UPDATE 576
BEGIN
mkngff_fileset
----------------
5289242
(1 row)
COMMIT
$ omero mkngff symlink /data/OMERO/ManagedRepository 16454 "/idr0001/zarr/JL_120801_S7A.ome.zarr" --bfoptions
...
Creating dir at /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324_mkngff/JL_120801_S7A.ome.zarr -> /idr0001/zarr/JL_120801_S7A.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2015-11/23/17-31-08.324_mkngff/JL_120801_S7A.ome.zarr.bfoptions
This Plate renders OK in webclient 👍
NB: it avoids the ZarrReader bug above since it doesn't have acquisition starttime or endtime metadata.
Create symlinks...
cd /data/idr0001/raw
for PLATE in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/); do
echo $PLATE
mkdir $PLATE
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
mkdir "$PLATE/$i"
for f in $(ls "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i"); do
ln -s "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i/$f" "$PLATE/$i/$f"
done
rm -f "$PLATE/$i"/*2.flex
done
done
Count flex files for each Plate/Acquisition in plates.csv (in IDR):
for PLATE in $(cat plates.csv); do
echo $PLATE >> counts3.log
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
ls "$PLATE/$i" | wc >> counts3.log
done
done
cat counts3.log grep -v 96
# ignored Plates where all acquisitions have 96 files. That leaves...
JL_120801_S7B
95 95 1425
JL_120814_S17A
95 95 1425
JL_120816_S19B
95 95 1425
JL_120907_S2A
95 95 1425
JL_121214_J2_1
95 95 1425
JL_121217_J7_1
95 95 1425
JL_121219_J4_3
95 95 1425
JL_121220_J5_3
95 95 1425
JL_130126_J10_3
93 93 1395
JL_130126_J10_4
91 91 1365
JL_130126_J9_3
94 94 1410
JL_130126_J9_4
93 93 1395
JL_130127_J10_5
94 94 1410
JL_130127_J9_5
93 93 1395
JL_130127_J9_6
94 94 1410
JL_130128_J9_7
95 95 1425
JL_130128_J9_8
95 95 1425
95 95 1425
95 95 1425
95 95 1425
95 95 1425
JL_130304_R1_3
95 95 1425
X_110213_S2_Blue1000
95 95 1425
X_110331_S13
95 95 1425
X_110425_S6
95 95 1425
X_110429_S24
95 95 1425
Checking for counts of ...2.flex files which I've "deleted" (removed symlinks) above.
Most plates/acquisitions have 0 or 96 but some have other counts. Getting complex!
for PLATE in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/); do
echo $PLATE >> counts2.log
for i in $(ls /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/); do
ls "/uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/$PLATE/$i" | grep "2.flex" | wc >> counts2.log
done
done
On pilot-zarr1-dev
$ cd /data/idr0001/raw
$ cat plates.csv
JL_120731_S6A
JL_120731_S6B
JL_120801_S7A
...
$ cat plates.csv | wc
192 192 2828
for i in $(cat plates.csv); do
first_flex=$(find $i -name *.flex -print -quit)
echo $first_flex
~/bioformats2raw-0.6.0-24/bin/bioformats2raw "$first_flex" ngff/$i.ome.zarr
done
Try viewing some of the 22 Plates above which have fewer than 96 .flex files...
Are these OK on IDR??
-
Plate
JL_120801_S7Bonly has 3 Acquisitions in IDR and these images are viewable. It is the 4th Acquisition that has only 95 flex files -
JL_120814_S17Aappears to be all OK. -
Plate
JL_120816_S19B- The 2nd batch of 50 thumbnails doesn't load - No images viewablemessage = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/24/03-19-34.711/Meas_01(2012-08-16_10-38-18)/001001001.flex -
JL_120907_S2A-Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/24/18-09-54.256/Meas_01(2012-09-07_03-04-56)/001001001.flex -
E.g. plate
JL_130126_J10_3https://idr.openmicroscopy.org/webclient/?show=well-522426 Try to view Preview tab:
serverExceptionClass = ome.conditions.ResourceError
message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-10/01/19-56-57.181/Meas_01(2013-01-26_16-42-53)/005004001.flex
}
Since none of the Plates with missing .flex files above have successfully been converted by the script running above, let's test one on it's own (NB: this is the symlinked data with the *2.flex files removed):
screen -S idr0001_test:
$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw JL_120801_S7B/Meas_01\(2012-08-01_18-40-47\)/001001001.flex ngff/JL_120801_S7B.ome.zarr
...
2024-01-12 13:51:27,012 [main] WARN loci.formats.FormatHandler - parsing /data/idr0001/raw/JL_120801_S7B/Meas_06(2012-08-02_01-08-20)/008010001.flex
2024-01-12 13:51:27,669 [main] WARN loci.formats.FormatHandler - parsing /data/idr0001/raw/JL_120801_S7B/Meas_06(2012-08-02_01-08-20)/008011001.flex
2024-01-12 13:51:28,343 [main] WARN loci.formats.FormatHandler - parsing /data/idr0001/raw/JL_120801_S7B/Meas_06(2012-08-02_01-08-20)/008012001.flex
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@16150369): java.lang.ArithmeticException: / by zero
at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
at picocli.CommandLine.call(CommandLine.java:2761)
at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.ArithmeticException: / by zero
at loci.formats.in.FlexReader.groupFiles(FlexReader.java:1402)
at loci.formats.in.FlexReader.initFlexFile(FlexReader.java:581)
at loci.formats.in.FlexReader.initFile(FlexReader.java:390)
at loci.formats.FormatReader.setId(FormatReader.java:1389)
at loci.formats.ImageReader.setId(ImageReader.java:849)
at com.glencoesoftware.bioformats2raw.Converter.getBaseReaderClass(Converter.java:2032)
at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:540)
at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:516)
at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:107)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
... 9 more
Also try it on the original data where the number of 2.flex files for each acquisition matches the *1.flex file counts.
(bioformats2raw) [wmoore@pilot-zarr1-dev raw]$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/JL_120801_S7B/Meas_01\(2012-08-01_18-40-47\)/001001001.flex ngff/JL_120801_S7B.ome.zarr
This actually worked and has the correct number of Fields!
(base) [wmoore@pilot-zarr1-dev ngff]$ ls -alh !$
ls -alh JL_120801_S7B.ome.zarr/A/1
total 0
drwxrwxr-x. 8 wmoore wmoore 88 Jan 12 14:28 .
drwxrwxr-x. 14 wmoore wmoore 169 Jan 12 14:13 ..
drwxrwxr-x. 6 wmoore wmoore 100 Jan 12 14:12 0
drwxrwxr-x. 6 wmoore wmoore 100 Jan 12 14:15 1
drwxrwxr-x. 6 wmoore wmoore 100 Jan 12 14:18 2
drwxrwxr-x. 6 wmoore wmoore 100 Jan 12 14:21 3
drwxrwxr-x. 6 wmoore wmoore 100 Jan 12 14:24 4
drwxrwxr-x. 6 wmoore wmoore 100 Jan 12 14:28 5
Plate has the same number of images (.zattrs) as previous S7A plate!
$ find JL_120801_S7A.ome.zarr/ -name ".zattrs" | wc
674 674 24906
$ find JL_120801_S7B.ome.zarr/ -name ".zattrs" | wc
674 674 24906
Another Plate that failed with symlinked data (without *2.flex files) - run against original data...
(bioformats2raw) [wmoore@pilot-zarr1-dev raw]$ ~/bioformats2raw-0.6.0-24/bin/bioformats2raw /uod/idr/filesets/idr0001-graml-sysgro/20151116-verified/JL_120814_S17A/Meas_01\(2012-08-14_10-49-32\)/001001001.flex ngff/JL_120814_S17A.ome.zarr
...
5 months later
Investigation of possible Bio-Formats fix at https://github.com/ome/bioformats/pull/3537 finds that this won't be easy/viable solution. So we need to revive the NGFF conversion work...
Summary from first read of history above:
- using
bioformats2raw-0.6.0-24to convert requires creating plate structure that lacks...2.flexfiles to avoid duplicating fields. But this has variable success depending on plate structure - using omero-cli-zarr takes 11 hours per plate (192 plates will take 3 months!!)
If we need to use omero-cli-zarr, old idr is still available at ssh 45.88.81.175 etc...
Would be good to convert all plates as above with bioformats2raw, and check which ones work with mkngff. Then possibly use omero-cli-zarr to export the ones that fail (hopefully smaller number) which is kinda slow.
Plates converted are ~47 GB so we only have space to convert 1 plate on each of pilot-zarr1-dev and pilot-zarr2-dev`.
(base) [wmoore@pilot-zarr1-dev ~]$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 100G 25G 76G 25% /
(base) [wmoore@pilot-zarr2-dev ~]$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 100G 20G 81G 20% /
Plates converted are ~47 GB so we only have space to convert 1 plate on each of pilot-zarr1-dev and pilot-zarr2-dev`.
(base) [wmoore@pilot-zarr1-dev ~]$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 100G 25G 76G 25% /(base) [wmoore@pilot-zarr2-dev ~]$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 100G 20G 81G 20% /
Converted data should never end up in the root partition. There is a dedidated /data partition on each of this node.
(base) [sbesson@pilot-zarr1-dev ~]$ df -h | grep /data
/dev/vdb 4.9T 4.8T 112G 98% /data
(base) [sbesson@pilot-zarr2-dev ~]$ df -h | grep /data
/dev/vdb 750G 89G 661G 12% /data
If 5TB is sufficient, let's review and discuss the cleanup the existing /data volume on these resources. Otherwise, let's define what the ideal requirements in terms of compute and storage and create a new pilot.
Thanks, @sbesson. @will-moore, were you looking to convert e.g. all of idr0001 at once rather than, e.g., moving each plate off to S3?
Need env for omero cli zarr... Want to install conda etc... First need to install wget!
$ ssh -A 45.88.81.175 -L 1080:localhost:80
Last login: Mon Aug 19 10:45:01 2024 from 82.132.231.200
NB: webclient is broken e.g. http://localhost:1080/webclient/
File "/opt/omero/web/venv3/lib64/python3.9/site-packages/omeroweb/webclient/views.py", line 490, in _load_template
active_group = request.session.get("active_group") or conn.getEventContext().groupId
AttributeError: 'NoneType' object has no attribute 'getEventContext'
Try to use omero-cli-zarr...
[wmoore@prod121-proxy ~]$ sudo yum install wget
[wmoore@prod121-proxy ~]$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ ~/miniconda3/bin/conda init bash
$ source .bashrc
$ conda create -n omeropy python=3.9 conda-forge::zeroc-ice==3.6.5 omero-py
We want to use https://github.com/ome/omero-cli-zarr/pull/147 (not merged yet) so we need to checkout that branch etc. Need to install git!
(omeropy) [wmoore@prod121-proxy idr0001]$ sudo yum install git
Merged https://github.com/ome/omero-cli-zarr/pull/147 with origin to fix scm version issue...
14:01...
omero zarr export Plate:2551
...
Completed ~ 10:30 pm - 8.5 hours for a Plate...
Rename plate, since I didn't use --name_by name during export...
and upload to s3...
$ mv 2551.ome.zarr JL_120731_S6A.ome.zarr
$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0101A/
(base) [wmoore@prod121-proxy idr0001]$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0101A/
.../JL_120731_S6A.ome.zarr/H/9/5/4/1/9/0/0: 32.53 GiB / 32.53 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 37.98 MiB/s 14m36s
EDIT... 29th August... Ooops! typo in the upload "idr0101A" -> "idr0001A".
(base) [wmoore@prod121-proxy idr0001]$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0001A/
...0001/JL_120731_S6A.ome.zarr.zip: 24.53 GiB / 24.53 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 134.05 MiB/s 3m7s
Ahhh! - typo again!!! JL_120731_S6A.ome.zarr.zip because I have deleted local unzipped data!
Unzip... Upload AGAIN!
$ unzip JL_120731_S6A.ome.zarr.zip
$ /home/wmoore/mc cp -r JL_120731_S6A.ome.zarr uk1s3/idr/zarr/v0.4/idr0001A/
...31_S6A.ome.zarr/H/9/5/4/1/9/0/0: 57.06 GiB / 57.06 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 89.84 MiB/s 10m50s
$ df -h ./
Filesystem Size Used Avail Use% Mounted on
/dev/vda5 79G 64G 16G 81% /
Cleanup:
time /home/wmoore/mc rm --recursive --force uk1s3/idr/zarr/v0.4/idr0101A/JL_120731_S6A.ome.zarr
/home/wmoore/mc rm --force uk1s3/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr.zip
Install goofys and mount bia-integrator-data and idr buckets...
$ wget https://github.com/kahing/goofys/releases/latest/download/goofys
$ chmod +x goofys
# didn't need this yet!
$ sudo mkdir /bia-integrator-data && sudo ./goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
$ sudo mkdir /uk1s3_idr && sudo ./goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr /uk1s3_idr
(base) [wmoore@prod121-proxy ~]$ ls /uk1s3_idr/zarr/v0.4/idr0001A
2551.zarr JL_120731_S6A.ome.zarr
All good!
mkngff...
conda activate omero-py
pip install 'omero-mkngff @ git+https://github.com/joshmoore/omero-mkngff@main'
First plate: Fileset 16451
$ omero login
$ time omero mkngff sql 16451 --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr" "/uk1s3_idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr" > "idr0001/16451.sql"
On pilot-idrngff... Following https://github.com/IDR/mkngff_upgrade_scripts
(venv3) (base) [wmoore@pilot-idrngff-omeroreadwrite ~]$ sudo mkdir /uk1s3_idr && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr /uk1s3_idr
$ psql -U omero -d idr -h $DBHOST -f 16451.sql
UPDATE 576
BEGIN
mkngff_fileset
----------------
6320797
(1 row)
COMMIT
$ omero login (as demo user)
$ omero mkngff symlink /data/OMERO/ManagedRepository 16451 "/uk1s3_idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr" --bfoptions --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr"
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579
Creating dir at /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr -> /uk1s3_idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr.bfoptions
View image... 10-15 mins...
Traceback (most recent call last):
File "/opt/omero/web/venv3/lib64/python3.9/site-packages/omero_api_RenderingEngine_ice.py", line 1192, in load
return _M_omero.api.RenderingEngine._op_load.invoke(self, ((), _ctx))
omero.ResourceError: exception ::omero::ResourceError
{
serverStackTrace = ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:907)
at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653)
serverExceptionClass = ome.conditions.ResourceError
message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
}
<WSGIRequest: GET '/webclient/metadata_preview/well/590809/?_=1724937458855'>
Blitz.log
2024-08-29 13:28:52,846 DEBUG [ loci.formats.Memoizer] (.Server-10) start[1724937495656] time[637189] tag[loci.formats.Memoizer.setId]
2024-08-29 13:28:52,847 ERROR [ ome.io.bioformats.BfPixelBuffer] (.Server-10) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
2024-08-29 13:28:52,848 ERROR [ ome.io.nio.PixelsService] (.Server-10) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/.zattrs
java.lang.RuntimeException: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898)
at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653)
at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:571)
at ome.services.RenderingBean$12.doWork(RenderingBean.java:2205)
at jdk.internal.reflect.GeneratedMethodAccessor317.invoke(Unknown Source)
...
Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
at loci.formats.in.ZarrReader.parsePlate(ZarrReader.java:764)
at loci.formats.in.ZarrReader.initFile(ZarrReader.java:361)
at loci.formats.FormatReader.setId(FormatReader.java:1480)
at loci.formats.ImageReader.setId(ImageReader.java:864)
at ome.io.nio.PixelsService$3.setId(PixelsService.java:869)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
at loci.formats.ChannelFiller.setId(ChannelFiller.java:258)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:317)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
at loci.formats.Memoizer.setId(Memoizer.java:726)
at ome.io.bioformats.BfPixelsWrapper.<init>(BfPixelsWrapper.java:52)
at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73)
This line of ZarrReader:
Integer acqStartTime = (Integer) acquistion.get("starttime");
Edited the plate /.zattrs file above to remove starttime attrs and replaced on s3:
$ /home/wmoore/mc cp JL_120731_S6A.ome.zarr/.zattrs uk1s3/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr/
...e/idr0001/JL_120731_S6A.ome.zarr/.zattrs: 13.99 KiB / 13.99 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓┃ 32.17 KiB/s 0s
Try to delete memo file on pilot-idrngff and re-create...
Don't see any memo file at:
bash-5.1$ ls -alh !$
ls -alh /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr
total 0
2024-08-30 10:28:11,565 DEBUG [ loci.formats.Memoizer] (.Server-20) saved to temp file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/..zattrs.bfmemo14210388543534098629
2024-08-30 10:28:11,566 DEBUG [ loci.formats.Memoizer] (.Server-20) start[1725013691240] time[326] tag[loci.formats.Memoizer.saveMemo]
2024-08-30 10:28:11,566 DEBUG [ loci.formats.Memoizer] (.Server-20) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/23/16-25-15.579_mkngff/JL_120731_S6A.ome.zarr/..zattrs.bfmemo (1318254 bytes)
2024-08-30 10:28:11,566 DEBUG [ loci.formats.Memoizer] (.Server-20) start[1725012959020] time[732546] tag[loci.formats.Memoizer.setId]
732546 ms is 12 minutes.
We can now view images for all acquisitions... BUT Wells are in the wrong place:
With new thumbnails generated by saving rendering settings, we see: The first 6 images (A1 - A6) are actually the 6 Fields from A1. The next 6 images (A7 - A12) are the 6 Fields from A2.
The next 6 images (B1 - B6) correspond to the 6 Fields from A3 (as seen in vizarr):
https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/JL_120731_S6A.ome.zarr/A/3
NB: few plates uploaded above:
$ aws --profile embassy s3 ls idr0001/zarr/
PRE JL_120731_S6A.ome.zarr/
PRE JL_120731_S6B.ome.zarr/
PRE JL_120801_S7A.ome.zarr/
See https://github.com/IDR/idr-metadata/issues/683#issuecomment-1887129269 "Plate renders OK in webclient" - but did it have the Well/acquisition layout issues in previous comment above?? https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0001/zarr/JL_120801_S7A.ome.zarr/
TODO: try mkngff with that plate
@sbesson pointed out that pilot-idrngff doesn't have latest ZarrReader. Try to update and check again... (delete memo etc)