Matt Maybeno

Results 17 comments of Matt Maybeno

> This specifically worked for my Sagemaker Studio Notebook

This may be the problem I have been seeing on our setup as well. Everything is built in docker images, so originally I wasn't sure why I was getting failures....

For anyone that has issues with this and builds annoy in docker, I used this line to override the `ANNOY_COMPILER_ARGS` environment variable per @erikbern's suggestion. Thanks! `ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native`

> > > > Hello may i know the exact line that you used to install annoy in docker? Not sure how to pass this argument into my dockerfile It...

If you inspect the html of the NYT and other sites where the parsing is not working as expected, they are doing something specific about the divs that separate out...

Upon further inspection, it has something to do with the `calculate_best_node` function in the extractor. For a given article, there is a heuristic method to find the top node of...

I've dug into this more as it really looks to be impacting how I'm obtaining articles. I've narrowed it down to specifically how it first gets `nodes_to_check = self.nodes_to_check(doc)` in...

@kmgreen2 How did your changes work out? The idea makes sense, I'll have to check out your fork. Thanks!

Interesting. I wonder if something in the csv reader/writer made changes in python 3.9 which fixed the issue.

Nice work @marlenachatzigrigoriou. Looks like you ruled out some of the possible options and confirmed it's probably either the `CsvItemExporter` or `botocore`. Hopefully it's not `botocore` since that codebase maybe...