incubator-hugegraph [Question] 如何使用gremlin实现多字段分组并实现去重

Problem Type (问题类型)

others (please edit later)

Before submit

[X] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题

Environment (环境信息)

Server Version: v0.12.x
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Centos 7.x
Data Size: xx vertices, xx edges

Your Question (问题描述)

我可以使用下述语句实现单字段分组，也就是根据职业对人进行分组 g.V().hasLabel('person').group().by('profession') 目前的使用场景是需要根据profession和city进行分组（profession和city都是person的属性），gremlin语法该如何实现？

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

Sep 29 '22 11:09 jokerCoCo

去重是用dedup()，比如 // 从各个年龄的人中选出一个代表 g.V().hasLabel('person').dedup().by('age')

多个属性分组：《图数据库实战》中写到：group()是根据指定的by()调节器对结果进行分组，使用一个或者两个by()调节器对数据进行分组。第一个by（）调节器指定分组的键，第二个by() 调节器如果存在，将指定值；如果不存在，则将传入数据收集为与分组键相关联的值列表。直观上的感觉似乎不支持？试了Group().by("A","B")也是报错，我尝试多种gremlin ，也是没找到根据多个属性进行分组的方式（后续我在找找，有好消息同步）

Sep 30 '22 03:09 chmx-ustc

去重是用dedup()，比如 // 从各个年龄的人中选出一个代表 g.V().hasLabel('person').dedup().by('age')

多个属性分组：《图数据库实战》中写到：group()是根据指定的by()调节器对结果进行分组，使用一个或者两个by()调节器对数据进行分组。第一个by（）调节器指定分组的键，第二个by() 调节器如果存在，将指定值；如果不存在，则将传入数据收集为与分组键相关联的值列表。直观上的感觉似乎不支持？试了Group().by("A","B")也是报错，我尝试多种gremlin ，也是没找到根据多个属性进行分组的方式（后续我在找找，有好消息同步）

感谢你的反馈。我在尝试连续使用两个group by,如下述语法 g.V().hasLabel('person').group('profession').by('profession').group('city').by('city').cap('profession', 'city'); 但是这样也只是分别根据profession、city进行了分组，而不是同时根据这两个属性分组，后续有发现我也会同步我这里的进展。

Sep 30 '22 04:09 jokerCoCo

按照多个属性分组可以尝试如下语句： g.V().hasLabel('person').group().by(values('profession', 'city'))

Oct 07 '22 08:10 javeme

按照多个属性分组可以尝试如下语句： g.V().hasLabel('person').group().by(values('profession', 'city'))

谢谢反馈。这个语句我试了一下还是不行，我在hubble可视化页面进行查询，查询结果会根据values的第一个属性进行分组，后续的属性没有发挥作用。

Oct 08 '22 07:10 jokerCoCo

再尝试一下这语句呢？ g.V().hasLabel('person').group().by(values('profession', 'city').unfold())

Oct 13 '22 14:10 javeme

g.V().hasLabel('person').group().by(values('profession', 'city').unfold())

谢谢您的回复。我试了一下还是不行，分组还是只按照第一个属性进行分组，如这个语句只会按照profession进行分组，city并没有起到作用。

Oct 14 '22 01:10 jokerCoCo

我这边简单的构造了一个示例数据，尝试了一下如下语句是可行的：

g.V().hasLabel('node').group().by{it.values('city', 'profession').toList()}.by(id)

示例数据：

schema=graph.schema();
schema.propertyKey('profession').asText().ifNotExist().create();
schema.propertyKey('city').asText().ifNotExist().create();
schema.vertexLabel('node')properties('profession','city').useCustomizeNumberId().create();
schema.edgeLabel('child').sourceLabel('node').targetLabel('node').create();

g.addV('node').property(id, 1).property('profession', "Java").property('city', 'Beijing').as('1')
 .addV('node').property(id, 2).property('profession', "Java").property('city', 'Beijing').as('2')
 .addV('node').property(id, 3).property('profession', "Java").property('city', 'Sanghai').as('3')
 .addV('node').property(id, 4).property('profession', "DB").property('city', 'Sanghai').as('4')
 .addE('child').from('1').to('2')
 .addE('child').from('2').to('3')
 .addE('child').from('4').to('3')

Oct 16 '22 05:10 javeme

我这边简单的构造了一个示例数据，尝试了一下如下语句是可行的：

g.V().hasLabel('node').group().by{it.values('city', 'profession').toList()}.by(id)

示例数据：

schema=graph.schema();
schema.propertyKey('profession').asText().ifNotExist().create();
schema.propertyKey('city').asText().ifNotExist().create();
schema.vertexLabel('node')properties('profession','city').useCustomizeNumberId().create();
schema.edgeLabel('child').sourceLabel('node').targetLabel('node').create();

g.addV('node').property(id, 1).property('profession', "Java").property('city', 'Beijing').as('1')
 .addV('node').property(id, 2).property('profession', "Java").property('city', 'Beijing').as('2')
 .addV('node').property(id, 3).property('profession', "Java").property('city', 'Sanghai').as('3')
 .addV('node').property(id, 4).property('profession', "DB").property('city', 'Sanghai').as('4')
 .addE('child').from('1').to('2')
 .addE('child').from('2').to('3')
 .addE('child').from('4').to('3')

非常感谢您的回复，这样确实可以实现多属性分组。但是分组后依旧会存在重复，请问该如何实现分组后对每组根据指定属性去重？

Oct 17 '22 06:10 jokerCoCo

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

Nov 01 '22 21:11 github-actions[bot]

目前还是没有进展，不知道是否有时间可以帮忙看下分组后去重gremlin如何实现？

Nov 09 '22 08:11 jokerCoCo

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

Nov 24 '22 21:11 github-actions[bot]