Missing YOLaT++ instructions
The readme only contains information on how to start training the older YOLaT model, but misses instructions for the newer YOLaT++ model.
I see that there is a build_graph_hierarchical.py file inside utils/svg_utils. So analogously calling
cd utils
python svg_utils/build_graph_hierarchical.py
would be one step. However, on which dataset (Diagrams, Floorplans, VG-DCU) would one run that, and what hyperparameters would yield the best results in your experience? Which python file should be called to train with YOLaT++ instead of YOLaT?
Since I have graduated, the YoLAT++ data preprocessing and modeling code is still in the previous school server, please wait for one to two weeks, I will contact someone in the school to send it to me and then organize and upload it in github.
Hi @shuguang-52, thanks a lot for creating the YoLAT++ branch and for updating bezier_parser.py and svg_parser.py!
I tried looking into the branch, but it seems the current code is still not sufficient to run a full training pipeline for YoLAT++ on VG-DCU. Just wondering if there are plans to add the remaining parts (e.g., training scripts) in a future update?
Thanks again for your effort in keeping this project going!
The remaining parts will be uploaded before September.
Thank you for your efforts in updating the repository. I really appreciate it. Do the most recent changes from August 26th conclude the updates?
If so, I am still unsure of how to prepare the datasets for training YOLaT++. Are the instructions the exact same as for YOLaT?
I could only infer that train.py gets called for training again, that I need to set arch to YolatV2, and two new data_dir options appear in the new code, namely vage and plotly. Also, I saw another arch value used called graphormer.
I am almost certain there is still code missing or some other kind of mix up.
In train.py:
if opt.arch == "YolatV2":
model = YolatV2(opt).to(opt.device)
criterion = LossV2(opt)
And LossV2 in architecture3cc_rpn_gp_iter4.py tries to call an unresolvable constructor ChartLossV2:
class LossV2(torch.nn.Module):
def __init__(self, opt):
super(ChartLossV2, self).__init__()
I can not find information about another employed loss in the paper other than the CrossEntropyLoss. This should probably just be named LossV2 instead.
Also in train.py, there is an unresolvable function f, and I assume it should be a constructor call to CADGLDataset instead.
train_dataset = f(opt.data_dir, opt, partition='train',
Then, there is also the new SVGGraphBuilderBezier3 in svg_parser.py that does not get used as far as I see. The same goes for cubic2BezierPath in bezier_parser.py. This is not a full list of unused functions and classes, but these two are standing out in particular to me, as these seem to be very reasonable additions.
I have the strong suspicion that an analogous file (or a newer version of them) to build_graph_bbox.py, build_graph_bbox_diagram.py, or build_graph_bbox_hierarchical.py is missing. Should I create a new issue as this drifts further away from the original topic?
Originally, I wanted to post a collection of some smaller breadcrumbs that led me to the conclusion that something is missing, like inconsistencies in the support of rect shapes, or that I can't find where text elements are handled as described in the paper:
As for
, we first obtain the bounding box of the element based on its attributes and then convert the box into a Bezier curve.
But the most definite hint is that I can not find an instance where Supernodes and Superedges are created as in section "B. YOLaT++ for the Real-World Vector Graphics" in the paper "Hierarchically Recognizing Vector Graphics and A New Chart-Based Vector Graphics Dataset". The three files mentioned above only implement the graph construction like it is described for YOLaT in section "A. Detection Model-YOLaT".
It also gets supported by the fact that there is some missing link present in graph_dict4.py during the proposal generation in _get_proposal when using opt.arch == "YolatV2". There, it tries to access graph_dict["attr"]["type"] and graph_dict["attr"]["value"], which are loaded from a file that loads a dictionary of the graph structure during __getitem__. However, these attributes do not get set during any graph structure generation and consequently can not be accessed.