OpenSplat icon indicating copy to clipboard operation
OpenSplat copied to clipboard

rocm6.3.3 on Ubuntu 24.04, loss won't go down after 2000 iters, while Ubuntu22.04 is fine

Open charyang-ai opened this issue 8 months ago • 5 comments

export HIP_VISIBLE_DEVICES=1
export HSA_OVERRIDE_GFX_VERSION=11.0.0  
cd /code/build
./opensplat /data/banana -n 2000

this is log:

root@b266ff43105e:/code/build# ./opensplat /data/banana -o /data/banana_2k_9070XT.ply -n 2000 --val
Using CUDA
Reading 14241 points
Loading /data/banana/images/frame_00001.JPG
Loading /data/banana/images/frame_00002.JPG
Loading /data/banana/images/frame_00003.JPG
Loading /data/banana/images/frame_00004.JPG
Loading /data/banana/images/frame_00005.JPG
Loading /data/banana/images/frame_00006.JPG
Loading /data/banana/images/frame_00008.JPG
Loading /data/banana/images/frame_00009.JPG
Loading /data/banana/images/frame_00010.JPG
Loading /data/banana/images/frame_00011.JPG
Loading /data/banana/images/frame_00013.JPG
Loading /data/banana/images/frame_00014.JPG
Loading /data/banana/images/frame_00015.JPG
Loading /data/banana/images/frame_00016.JPG
Step 10: 0.325584 (0%)
Step 20: 0.32301 (1%)
Step 30: 0.335573 (1%)
Step 40: 0.35281 (2%)
Step 50: 0.330381 (2%)
Step 60: 0.330381 (3%)
Step 70: 0.329837 (3%)
Step 80: 0.329898 (4%)
Step 90: 0.335573 (4%)
Step 100: 0.333908 (5%)
Step 110: 0.35281 (5%)
Step 120: 0.35281 (6%)
Step 130: 0.335092 (6%)
Step 140: 0.340406 (7%)
Step 150: 0.35281 (7%)
Step 160: 0.333908 (8%)
Step 170: 0.325584 (8%)
Step 180: 0.349663 (9%)
Step 190: 0.332534 (9%)
Step 200: 0.330175 (10%)
Step 210: 0.329898 (10%)
Step 220: 0.325584 (11%)
Step 230: 0.35281 (11%)
Step 240: 0.329898 (12%)
Step 250: 0.335092 (12%)
Step 260: 0.349663 (13%)
Step 270: 0.349663 (13%)
Step 280: 0.335092 (14%)
Step 290: 0.333908 (14%)
Step 300: 0.335573 (15%)
Step 310: 0.349663 (15%)
Step 320: 0.35281 (16%)
Step 330: 0.349663 (16%)
Step 340: 0.330175 (17%)
Step 350: 0.330381 (17%)
Step 360: 0.335573 (18%)
Step 370: 0.335573 (18%)
Step 380: 0.35281 (19%)
Step 390: 0.32301 (19%)
Step 400: 0.325584 (20%)
Step 410: 0.335573 (20%)
Step 420: 0.325584 (21%)
Step 430: 0.35281 (21%)
Step 440: 0.325584 (22%)
Step 450: 0.335092 (22%)
Step 460: 0.32301 (23%)
Step 470: 0.333908 (23%)
Step 480: 0.332534 (24%)
Step 490: 0.35281 (24%)
Step 500: 0.329898 (25%)
Step 510: 0.330175 (25%)
Step 520: 0.330381 (26%)
Step 530: 0.335573 (26%)
Step 540: 0.329898 (27%)
Step 550: 0.35281 (27%)
Step 560: 0.35281 (28%)
Step 570: 0.333908 (28%)
Step 580: 0.335092 (29%)
Step 590: 0.335573 (29%)
Step 600: 0.329837 (30%)
Step 610: 0.325584 (30%)
Step 620: 0.330381 (31%)
Step 630: 0.335092 (31%)
Step 640: 0.333908 (32%)
Step 650: 0.35281 (32%)
Step 660: 0.332534 (33%)
Step 670: 0.340406 (33%)
Step 680: 0.335573 (34%)
Step 690: 0.330381 (34%)
Step 700: 0.329837 (35%)
Step 710: 0.335092 (35%)
Step 720: 0.335573 (36%)
Step 730: 0.333908 (36%)
Step 740: 0.329837 (37%)
Step 750: 0.335573 (37%)
Step 760: 0.335092 (38%)
Step 770: 0.35281 (38%)
Step 780: 0.333908 (39%)
Step 790: 0.32301 (39%)
Step 800: 0.332534 (40%)
Step 810: 0.349663 (40%)
Step 820: 0.329837 (41%)
Step 830: 0.340406 (41%)
Step 840: 0.335573 (42%)
Step 850: 0.332534 (42%)
Step 860: 0.330381 (43%)
Step 870: 0.349663 (43%)
Step 880: 0.333908 (44%)
Step 890: 0.335573 (44%)
Step 900: 0.332534 (45%)
Step 910: 0.349663 (45%)
Step 920: 0.349663 (46%)
Step 930: 0.349663 (46%)
Step 940: 0.35281 (47%)
Step 950: 0.335092 (47%)
Step 960: 0.333908 (48%)
Step 970: 0.349663 (48%)
Step 980: 0.32301 (49%)
Step 990: 0.335092 (49%)
Step 1000: 0.329898 (50%)
Step 1010: 0.332534 (50%)
Step 1020: 0.330381 (51%)
Step 1030: 0.330381 (51%)
Step 1040: 0.329837 (52%)
Step 1050: 0.335573 (52%)
Step 1060: 0.35281 (52%)
Step 1070: 0.325584 (53%)
Step 1080: 0.332534 (54%)
Step 1090: 0.329837 (54%)
Step 1100: 0.332534 (55%)
Step 1110: 0.329837 (55%)
Step 1120: 0.329837 (56%)
Step 1130: 0.32301 (56%)
Step 1140: 0.335573 (57%)
Step 1150: 0.35281 (57%)
Step 1160: 0.333908 (58%)
Step 1170: 0.335092 (58%)
Step 1180: 0.325584 (58%)
Step 1190: 0.32301 (59%)
Step 1200: 0.329837 (60%)
Step 1210: 0.32301 (60%)
Step 1220: 0.35281 (61%)
Step 1230: 0.332534 (61%)
Step 1240: 0.35281 (62%)
Step 1250: 0.330381 (62%)
Step 1260: 0.335092 (63%)
Step 1270: 0.335092 (63%)
Step 1280: 0.32301 (64%)
Step 1290: 0.335573 (64%)
Step 1300: 0.330175 (65%)
Step 1310: 0.330175 (65%)
Step 1320: 0.332534 (66%)
Step 1330: 0.329837 (66%)
Step 1340: 0.335092 (67%)
Step 1350: 0.335573 (67%)
Step 1360: 0.330381 (68%)
Step 1370: 0.340406 (68%)
Step 1380: 0.325584 (69%)
Step 1390: 0.330381 (69%)
Step 1400: 0.335092 (70%)
Step 1410: 0.340406 (70%)
Step 1420: 0.330381 (71%)
Step 1430: 0.329837 (71%)
Step 1440: 0.329837 (72%)
Step 1450: 0.329837 (72%)
Step 1460: 0.335092 (73%)
Step 1470: 0.329898 (73%)
Step 1480: 0.330175 (74%)
Step 1490: 0.329898 (74%)
Step 1500: 0.32301 (75%)
Step 1510: 0.329898 (75%)
Step 1520: 0.335092 (76%)
Step 1530: 0.325584 (76%)
Step 1540: 0.330175 (77%)
Step 1550: 0.325584 (77%)
Step 1560: 0.340406 (78%)
Step 1570: 0.35281 (78%)
Step 1580: 0.325584 (79%)
Step 1590: 0.332534 (79%)
Step 1600: 0.332534 (80%)
Step 1610: 0.32301 (80%)
Step 1620: 0.332534 (81%)
Step 1630: 0.325584 (81%)
Step 1640: 0.32301 (82%)
Step 1650: 0.35281 (82%)
Step 1660: 0.340406 (83%)
Step 1670: 0.340406 (83%)
Step 1680: 0.325584 (84%)
Step 1690: 0.335573 (84%)
Step 1700: 0.340406 (85%)
Step 1710: 0.35281 (85%)
Step 1720: 0.340406 (86%)
Step 1730: 0.329898 (86%)
Step 1740: 0.335092 (87%)
Step 1750: 0.333908 (87%)
Step 1760: 0.349663 (88%)
Step 1770: 0.330175 (88%)
Step 1780: 0.332534 (89%)
Step 1790: 0.335092 (89%)
Step 1800: 0.340406 (90%)
Step 1810: 0.349663 (90%)
Step 1820: 0.333908 (91%)
Step 1830: 0.335573 (91%)
Step 1840: 0.349663 (92%)
Step 1850: 0.329898 (92%)
Step 1860: 0.335573 (93%)
Step 1870: 0.330381 (93%)
Step 1880: 0.330175 (94%)
Step 1890: 0.332534 (94%)
Step 1900: 0.330381 (95%)
Step 1910: 0.340406 (95%)
Step 1920: 0.333908 (96%)
Step 1930: 0.330175 (96%)
Step 1940: 0.329898 (97%)
Step 1950: 0.333908 (97%)
Step 1960: 0.329898 (98%)
Step 1970: 0.335092 (98%)
Step 1980: 0.349663 (99%)
Step 1990: 0.335092 (99%)
Step 2000: 0.349663 (100%)
Wrote /data/cameras.json
Wrote /data/banana_2k_9070XT.ply
/data/banana/images/frame_00015.JPG validation loss: 0.356569

charyang-ai avatar Apr 27 '25 03:04 charyang-ai

@jcaesar @eokeeffe @pierotofy @pfxuan could you give some advice?

charyang-ai avatar Apr 27 '25 03:04 charyang-ai

@pfxuan @eokeeffe @pierotofy

I made system experiments. The issue is caused by Ubuntu24.04. Same hardware, same version of ROCm and PyTorch, it works on Ubuntu 22.04 but failed on Ubuntu 24.04. The failure is loss wont go down.

charyang-ai avatar Apr 28 '25 03:04 charyang-ai

@pfxuan @eokeeffe @pierotofy

Even I running on a Ubuntu24.04 host and Ubuntu 22.04 container, it won't work.

charyang-ai avatar Apr 28 '25 03:04 charyang-ai

Please stop tagging people and soliciting advice. Open source software does not work like this. You've reported a possible bug. When and if people will respond is up to them. Your spamming will not help expedite its resolution.

pierotofy avatar Apr 28 '25 06:04 pierotofy

So apologize for any disturbing. I just want to raise attention for this issue which show stopper on AMD hardware. take it easy and thanks for reminding.

charyang-ai avatar Apr 28 '25 07:04 charyang-ai