neo4j-streams
neo4j-streams copied to clipboard
Neo4j streams cdc mechanism pulls the before data also which cause Space issue
Bug Description
Working on Neo4j Streams CDC by following this article https://neo4j.com/labs/kafka/4.0/producer/ but it is sending all the data to whatever we didn't configured also.
Expected Behavior (Mandatory)
Neo4j streams has to send the configured node data only to the kafka topic
Actual Behavior (Mandatory)
it is working on the following behaviour:
1)Configured node data goes to correct topic
2)Other nodes data are stored in topic with db name.
Steps to Reproduce the Problem
First create a Neo4j database with apoc only and login to the instance .
in the neo4j db create some sample nodes
create(n:Test1neo4jdb{name:'a'})
create(n:Test1neo4jdb{name:'b'})
create(n:Test1neo4jdb{name:'c'})
create(n:Test1neo4jdb{name:'d'})
create(n:Test1neo4jdb{name:'e'})
create(n:Test1neo4jdb{name:'f'})
create(n:Test2neo4jdb{name:'a'})
create(n:Test2neo4jdb{name:'b'})
create(n:Test2neo4jdb{name:'c'})
create(n:Test2neo4jdb{name:'d'})
create(n:Test2neo4jdb{name:'e'})
create(n:Test2neo4jdb{name:'f'})
create an another instance of neo4j
create database testdb
insert some sample data in that instance also
create(n:Test55testdb{name:'a'})
create(n:Test55testdb{name:'b'})
create(n:Test55testdb{name:'c'})
create(n:Test55testdb{name:'d'})
create(n:Test55testdb{name:'e'})
create(n:Test55testdb{name:'f'})
create(n:Test56testdb{name:'qw'})
create(n:Test56testdb{name:'as'})
create(n:Test56testdb{name:'dd'})
create(n:Test56testdb{name:'ee'})
create(n:Test56testdb{name:'vv'})
create(n:Test56testdb{name:'er'})
Now goto the neo4j.conf file and add the below configurations
streams.source.enabled=true
streams.sink.enabled=false
kafka.zookeeper.connect=localhost:2181
kafka.bootstrap.servers=localhost:9092
kafka.acks=-1
kafka.retries=10000
kafka.num.partitions=1
kafka.batch.size=16384
kafka.buffer.memory=33554432
kafka.reindex.batch.size=1000
kafka.replication=1
kafka.topic.discovery.polling.interval=5000
kafka.request.timeout.ms=172800000
kafka.delivery.timeout.ms=182800000
kafka.reconnect.backoff.ms=3000
kafka.reconnect.backoff.max.ms=172800000
streams.source.topic.nodes.test56.from.testdb=Test56testdb{*}
apoc.import.file.enabled=true
apoc.trigger.enabled=true
streams.procedures.enabled=true
dbms.security.procedures.whitelist=streams.*,apoc.*
In short term i configured to Took Test56testdb node from testdb database and store it in the topic called test56
But in the doccumentation it is showing like it is only works in incremental mode but in my local it will take all node data from testdb and stored it in the topic name of testdb and also it will create a topic called neo4j with all the data from neo4j instance
but whenever i treid to create data on the same configured node name it will works perfectly in that case.
This extra data pulling is the major issue ,we are facing High Space issue .
The above information will help you to reproduce the issue
Currently used versions
Versions
- OS: windows 10
- Neo4j: 4.2.1
- Neo4j-Streams: 4.0.6
@vishnu2497 if you want to prevent the neo4j
and any other database except for testdb
to send cdc event you have to configure it properly:
streams.source.enabled=false
streams.source.enabled.from.testdb=true
Closing due to inactivity, feel free to re-open if the issue is not resolved.