platform
platform copied to clipboard
Experiment was added but there is no trace of it
- https://master.project-hobbit.eu/experiments/1522280712117 (#251)
- https://master.project-hobbit.eu/experiments/1570873670213
- https://master.project-hobbit.eu/experiments/1570873680070
- https://master.project-hobbit.eu/experiments/1571241433909
- https://master.project-hobbit.eu/experiments/1571241501597
- https://master.project-hobbit.eu/experiments/1571241506168
- https://master.project-hobbit.eu/experiments/1571241508935
- https://master.project-hobbit.eu/experiments/1571305539287
- https://master.project-hobbit.eu/experiments/1571305551867
- https://master.project-hobbit.eu/experiments/1571305563423
- https://master.project-hobbit.eu/experiments/1571003293081
- https://master.project-hobbit.eu/experiments/1571003325036
- https://master.project-hobbit.eu/experiments/1571830806501
- https://master.project-hobbit.eu/experiments/1587370994428 (log)
- https://master.project-hobbit.eu/experiments/1593343732370
These experiments were configured, added, and were visible in the queue (some of them, but not all, canceled), but they're not in the queue anymore, not running and not displayed as cancelled, errored or done.
See also #452
This can happen when there are connectivity problems and controller cannot store experiment results.
Possible solutions would be to actually do something (what?) when the INSERT
query fails:
https://github.com/hobbit-project/platform/blob/master/platform-controller/src/main/java/org/hobbit/controller/ExperimentManager.java#L429-L439
The same problem occurs when the platform can not store the model because Virtuoso does not like the data (e.g., because of NaN
in a double value).
dump the data in a file might be a good idea. However, it should also log the problem so that we are aware of it.
Dumping the experiment result models to files (and having them accessible with HTTP/FTP) sounds like a good idea in general.
I agree in general. However, I see an issue there: at the moment, we assume that we have a master node that handles the management / communication and a second node for data. Storing the files on the data node and offering them via HTTP (e.g., with nginx or something similar) should be easy. However, the following lines can create issues:
This can happen when there are connectivity problems and controller cannot store experiment results
If the connectivity between the master and the data node is the issue, both storage components (the one triple store as well as the file writer) wouldn't be reachable.
Hence, the files would have to be stored on the master node which kind of breaks our overall idea of separating the data and the management :thinking: