David Thulke
David Thulke
@hbruch and @karussell did you follow up on this any further? In German there are a lot of street names like "Kölner Straße" which users spell as "Kölnerstraße" and thus...
> I guess the most canonical way would be to write a RETURNN Dataset for this. Maybe derived from CachedDataset2. I already implemented this as a custom dataset some time...
> Related is the Sisyphus job to prepare HuggingFace datasets (https://github.com/rwth-i6/i6_core/pull/253). Doesn't this handle the caching? Ideally we should prepare our dataset wrapper here such that it works properly together...
I also observed this and wanted to fix it at some point. My initial guess was that it's related to the update frequency in: https://github.com/rwth-i6/sisyphus/blob/f594919ce17373c39369be80f40ade116d8a1209/sisyphus/graph.py#L184C9-L184C28 I.e. the problem is that...
No, this is why I suggest Option 1 and propose to remove the existing broken code that tries to do this.
Great! Will prepare a PR.
I guess really identifying the limits will be quite hard as the current implementation is for example not really aware of the partition a job is submitted to. What about...
> I think handling all of this would add way too much complexity. +1 Btw. to make it more complicated, one job might be submitted to multiple partitions. What do...