AIL-framework
AIL-framework copied to clipboard
Indexer.py
Hi, Whilst running the scripts Ive noticed this in the indexer
Indexing - 1580809702 : archive/pastebin.com_pro/2020/02/11/FjQhcqFk.gz Indexing - 1580809702 : archive/gist.github.com/2020/02/11/sylr_b065e1fbd3de0c2ff095d83b969e6db4.gz Indexing - 1580809702 : archive/pastebin.com_pro/2020/02/11/UQcU4SKv.gz Indexing - 1580809702 : archive/pastebin.com_pro/2020/02/11/PhHjsK9A.gz Indexing - 1580809702 : archive/ideone.com/2020/02/11/qYcdfx.gz bash: line 1: 20994 Killed /home/ubuntu/Apps/AIL-framework//AILENV/bin/python ./Indexer.py
The indexer queue obviously stopped, Is there a way to restart a single screen/service again and redo the queue? The moldule information script doesnt work as reported previously.
Hi @Phil-ThePower-Pearce !
It seem like something on your server kill this script.
You can manually relaunch it:
-
Screen -r Script_AIL
-
Crtl+a c
- `. ./AILENV/bin/activate``
-
cd bin
-
./Indexer.py
How can I tell what killed it? As far as Im concerned its an aws ec2 t2.medium ubuntu 18.04 instance, up-to-date and only running ail + feeder and pystemon.
The indexer amount in the gui, just increases... decreases very very slowly
http://tinyurl.com/ttum4m6
In the indexder script
Traceback (most recent call last): File "./Indexer.py", line 134, in <module> content=paste) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 483, in update_document with self.searcher() as s: File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 297, in searcher return Searcher(self.reader(), **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 639, in reader self.generation, reuse=reuse) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in _reader readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in <listcomp> readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 524, in segreader generation=generation) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/reading.py", line 620, in __init__ self._terms = self._codec.terms_reader(self._storage, segment) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/whoosh3.py", line 122, in terms_reader postfile = segment.open_file(storage, self.POSTS_EXT) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/base.py", line 556, in open_file return storage.open_file(fname, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/filestore.py", line 333, in open_file return self.a.open_file(name, *args, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/compound.py", line 121, in open_file f = BufferFile(buf, name=name) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/structfile.py", line 357, in __init__ self.file = BytesIO(buf) MemoryError
A new one today
Traceback (most recent call last): File "./Indexer.py", line 134, in <module> content=paste) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 483, in update_document with self.searcher() as s: File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 297, in searcher return Searcher(self.reader(), **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 639, in reader self.generation, reuse=reuse) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in _reader readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in <listcomp> readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 524, in segreader generation=generation) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/reading.py", line 620, in __init__ self._terms = self._codec.terms_reader(self._storage, segment) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/whoosh3.py", line 122, in terms_reader postfile = segment.open_file(storage, self.POSTS_EXT) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/base.py", line 556, in open_file return storage.open_file(fname, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/filestore.py", line 333, in open_file return self.a.open_file(name, *args, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/compound.py", line 121, in open_file f = BufferFile(buf, name=name) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/structfile.py", line 357, in __init__ self.file = BytesIO(buf) MemoryError
This seem to be a memory issue. Are you processing large files ?
Im litteraly pulling the data from CIRCL feed, the indexer just keeps increasing and when it hits 3000+ the queue gets stuck, I look in the script, I see an error like above.
Hey, what are the specs of your system? How much memory is available?
aws ec2 t2.medium ubuntu 18.04 instance 2 vCPUs, 4Gb Memory
Only running AIL, and only importing feeds from CIRCL
It seem like the Indexer run out of memory. The minimum configuration is at least 2 CPUs and 8GB of memory.
Im rebuilding the ec2 instance with the above settings. Will retry
On your advice I created an ec2 instance with 2 cpus and 8gb memory
TermTracker
`Traceback (most recent call last):
File "./TermTrackerMod.py", line 79, in <module>
dict_words_freq = Term.get_text_word_frequency(item_content)
File "/home/ubuntu/Apps/AIL-framework/bin/packages/Term.py", line 96, in get_text_word_frequency
words_dict[word] += 1
File "./TermTrackerMod.py", line 34, in timeout_handler
raise TimeoutException
__main__.TimeoutException
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./TermTrackerMod.py", line 81, in <module>
print ("{0} processing timeout".format(paste.p_rel_path))
NameError: name 'paste' is not defined
`
keys:
`Traceback (most recent call last):
File "./Keys.py", line 168, in <module>
paste = Paste.Paste(message)
File "/home/ubuntu/Apps/AIL-framework/bin/packages/Paste.py", line 79, in __init__
self.p_mime = magic.from_buffer(self.get_p_content(), mime=True)
File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 148, in from_buffer
return m.from_buffer(buffer)
File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 82, in from_buffer
return self._handle509Bug(e)
File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 101, in _handle509Bug
raise e
File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 80, in from_buffer
return maybe_decode(magic_buffer(self.cookie, buf))
File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 255, in magic_buffer
return _magic_buffer(cookie, buf, len(buf))
File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 188, in errorcheck_null
raise MagicException(err)
magic.MagicException: b'cannot allocate 172513878 bytes (Cannot allocate memory)'
`