bbrf-client icon indicating copy to clipboard operation
bbrf-client copied to clipboard

"Expecting Value: line 1 column 1 (char 0)" Error encountered when adding large number of subdomains through cat

Open contaminatedesert opened this issue 2 years ago • 5 comments

Running command cat file.txt | bbrf domain add - using a large file (> 40K lines) causes a "Expecting Value: line 1 column 1 (Char 0)" error as seen in the screenshot. This file only contains subdomains, one per line, but contains several thousand.

image

contaminatedesert avatar Mar 12 '22 02:03 contaminatedesert

Hi @contaminatedesert - thanks for flagging this. This issue has popped up from time to time, so I figured to have a decent look at trying to fix this. I was able to reproduce this locally and the issue was the bbrf server times out with a 504, but the documents were added. Could you verify if this is also the case for you?

If not, could you please enable debugging mode by adding "debug":true to ~/.bbrf/config.json and add the output here?

honoki avatar Mar 29 '22 19:03 honoki

Another possible issue is that the request size is too large, see this comment: https://github.com/honoki/bbrf-client/issues/78#issuecomment-1003755487

I am debating what is the most graceful way to handle either of these errors.

honoki avatar Mar 29 '22 20:03 honoki

Hello @honoki,

Thank you for your response. I had not seen that original thread on this issue, so thank you for that. One interesting, though probably unrelated item, is that I too am importing .mil domains.

I have used my own workaround by breaking my file up in to pieces using head and tail and that works. One thing that I noticed is that it does not always fail on the same number, sometimes I can chunk my domains into >10,000, other times it only likes <10,000.

I'm not quite sure what you mean by request size being too large. What request? The request from BBRF to validate the domain? Some other request?

To answer your question, it did seem that even though I was receiving the error, they did seem to be added to the database, or at least most of them, there did seem to be some dropouts but that may have been the domain validation.

I am going to add more domains right now and turn debugging on and add everything here (I may obfuscate some data to maintain my privacy).

I will post what I find.

contaminatedesert avatar Mar 30 '22 22:03 contaminatedesert

@honoki

Alright, so here's what I've done. I added the debug item and restarted bbrf-server.

I then counted the number of navy.mil domains already existing in my database and the result was 4,112.

I then counted the # of entries in the file I was about to import and that was 50,298. I then proceeded to import the file in the usual way cat file.txt | bbrf domain add - > debug.txt as you can see I added the debug data to a file, which is attached.

I too encountered the 504 error. Interestingly, the debug data I saw was not the same as in the debug.txt file, so I will add it below.

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): 127.0.0.1:443 DEBUG:urllib3.connectionpool:https://127.0.0.1:443 "GET /bbrf/dod HTTP/1.1" 200 None DEBUG:urllib3.connectionpool:https://127.0.0.1:443 "POST /bbrf/_bulk_docs HTTP/1.1" 504 167

The debug.txt file contains the steps taken by BBRF as well as the error we've been encountering. debug.txt

I then redid the count of navy.mil domains and got the result 52,712. As you can see, most of my domains were properly added, though I am down roughly 1,700 for some reason.

I did not encounter the 413 error, although I think this is likely because I am using the docker image.

Another bit of information that may be helpful, is that when I encounter this issue, most of the time (but not all the time) I experience a significant decrease in system performance. This is corrected when I stop the docker images and restart bbrf-server.

Please let me know if there's anything else I can do to assist.

contaminatedesert avatar Mar 30 '22 23:03 contaminatedesert

In my experience, when adding a large amount of data you need to check if the data is what it's supposed to be. Use grep to find garbage in your domain list (symbols, spaces, etc.) Then, to add the domains is better to do it in chunks, basically dividing the input in several parts, my default value is 1,000, but If you have more than 2vcpu and more than 4 GB RAM, you can go higher.

pdelteil avatar Nov 29 '23 05:11 pdelteil