tutorials
tutorials copied to clipboard
how to speed up "File Tail" ?
I have a log named access.log size 3.8G
I create a simple pipeline filetail-trash
not rate limited . but I found Record Throughput only less then 20 records/s
pipeline with default config
how to optimize ?
What version of SDC? Can you export the pipeline and post it here? You should get thousands of records/sec!
7 seconds to ingest 9890 records on my laptop:

2.4.0.0 f056e6c0-40bd-4cdc-bb4a-8df2a53576c2.txt not support .json file ,so i rename to .txt ,you can download and rerename it yes ,yesterday ,I use a script write lines to a file with rate 10000 lines/sec, streamsets file tail can reach that rate , so I wonder the reason is the file size too big , every batch rewrite the offset, and next batch need to re seek from top to that offset ? I need your help ,@metadaddy
also I want know when FileTail trigger to collect log file ? if i have a file ,but no new line appended , FileTail will not be triggered ?
I looked at your pipeline - I don't see anything that would slow it down.
The file tail reader will only seek at the beginning of each batch, so it shouldn't impact performance that much. You could test this by changing the batch size. Note - you will need to edit sdc.properties to increase batch size beyond 1000 - see https://streamsets.com/documentation/datacollector/latest/help/#Troubleshooting/Troubleshooting_title.html#concept_ay2_w1l_2s
File tail will read all of the existing data, then wait for new data, so it should work for you. A better choice, if the file will not be changing, might be the directory origin.