neo4j-streams icon indicating copy to clipboard operation
neo4j-streams copied to clipboard

Neo4j streams cdc mechanism pulls the before data also which cause Space issue

Open vishnu2497 opened this issue 4 years ago • 1 comments

Bug Description

Working on Neo4j Streams CDC by following this article https://neo4j.com/labs/kafka/4.0/producer/ but it is sending all the data to whatever we didn't configured also.

Expected Behavior (Mandatory)

Neo4j streams has to send the configured node data only to the kafka topic

Actual Behavior (Mandatory)

it is working on the following behaviour:

1)Configured node data goes to correct topic

2)Other nodes data are stored in topic with db name.

Steps to Reproduce the Problem

First create a Neo4j database with apoc only and login to the instance .

in the neo4j db create some sample nodes


create(n:Test1neo4jdb{name:'a'})
create(n:Test1neo4jdb{name:'b'})
create(n:Test1neo4jdb{name:'c'})
create(n:Test1neo4jdb{name:'d'})
create(n:Test1neo4jdb{name:'e'})
create(n:Test1neo4jdb{name:'f'})

create(n:Test2neo4jdb{name:'a'})
create(n:Test2neo4jdb{name:'b'})
create(n:Test2neo4jdb{name:'c'})
create(n:Test2neo4jdb{name:'d'})
create(n:Test2neo4jdb{name:'e'})
create(n:Test2neo4jdb{name:'f'})

create an another instance of neo4j

create database testdb

insert some sample data in that instance also


create(n:Test55testdb{name:'a'})
create(n:Test55testdb{name:'b'})
create(n:Test55testdb{name:'c'})
create(n:Test55testdb{name:'d'})
create(n:Test55testdb{name:'e'})
create(n:Test55testdb{name:'f'})


create(n:Test56testdb{name:'qw'})
create(n:Test56testdb{name:'as'})
create(n:Test56testdb{name:'dd'})
create(n:Test56testdb{name:'ee'})
create(n:Test56testdb{name:'vv'})
create(n:Test56testdb{name:'er'})

Now goto the neo4j.conf file and add the below configurations

streams.source.enabled=true
streams.sink.enabled=false

kafka.zookeeper.connect=localhost:2181
kafka.bootstrap.servers=localhost:9092
kafka.acks=-1
kafka.retries=10000
kafka.num.partitions=1
kafka.batch.size=16384
kafka.buffer.memory=33554432
kafka.reindex.batch.size=1000
kafka.replication=1
kafka.topic.discovery.polling.interval=5000
kafka.request.timeout.ms=172800000
kafka.delivery.timeout.ms=182800000
kafka.reconnect.backoff.ms=3000
kafka.reconnect.backoff.max.ms=172800000



streams.source.topic.nodes.test56.from.testdb=Test56testdb{*}

apoc.import.file.enabled=true
apoc.trigger.enabled=true
streams.procedures.enabled=true
dbms.security.procedures.whitelist=streams.*,apoc.*

In short term i configured to Took Test56testdb node from testdb database and store it in the topic called test56

But in the doccumentation it is showing like it is only works in incremental mode but in my local it will take all node data from testdb and stored it in the topic name of testdb and also it will create a topic called neo4j with all the data from neo4j instance

but whenever i treid to create data on the same configured node name it will works perfectly in that case.

This extra data pulling is the major issue ,we are facing High Space issue .

The above information will help you to reproduce the issue

Currently used versions

Versions

  • OS: windows 10
  • Neo4j: 4.2.1
  • Neo4j-Streams: 4.0.6

vishnu2497 avatar Jan 15 '21 16:01 vishnu2497

@vishnu2497 if you want to prevent the neo4j and any other database except for testdb to send cdc event you have to configure it properly:

streams.source.enabled=false
streams.source.enabled.from.testdb=true

conker84 avatar Jan 29 '21 13:01 conker84

Closing due to inactivity, feel free to re-open if the issue is not resolved.

ali-ince avatar Sep 14 '23 10:09 ali-ince