sofifa-web-scraper icon indicating copy to clipboard operation
sofifa-web-scraper copied to clipboard

I can't make this work with FC 25 (sofifa)

Open FJFuture opened this issue 1 year ago • 12 comments

This is a great tool but i can't make it work. Any chance of you guys uploading the latest databse?

FJFuture avatar Oct 07 '24 15:10 FJFuture

What error are you getting, also please use node and npm mentioned in the readme file.

prashantghimire avatar Oct 08 '24 03:10 prashantghimire

Sorry, i'm getting a few different errors trying on Ubuntu and Windows. Obviously i'm not a pro on this topic. Can anyone upload an updated database please? You can close this topic since I won't be having enough time to look for a solucion these days unfortunately

FJFuture avatar Oct 08 '24 14:10 FJFuture

I was able to fix all the errors and now it's running perfectly. Thanks for your work, the only thing i'm seeing different from FC24 are the names. Again, thanks for your work! You can close this ticket

FJFuture avatar Oct 14 '24 21:10 FJFuture

I was able to fix all the errors and now it's running perfectly. Thanks for your work, the only thing i'm seeing different from FC24 are the names. Again, thanks for your work! You can close this ticket

hello my friend can you upload your data or your code? Im new in programming and i need this data.

kougeo25 avatar Nov 04 '24 18:11 kougeo25

I was able to fix all the errors and now it's running perfectly. Thanks for your work, the only thing i'm seeing different from FC24 are the names. Again, thanks for your work! You can close this ticket

hello my friend can you upload your data or your code? Im new in programming and i need this data.

Hi, yeah sure. I have the last data available, I can share it, do you want to give me your email?

FJFuture avatar Nov 04 '24 19:11 FJFuture

Yeah of course. [email protected]

I was able to fix all the errors and now it's running perfectly. Thanks for your work, the only thing i'm seeing different from FC24 are the names. Again, thanks for your work! You can close this ticket

hello my friend can you upload your data or your code? Im new in programming and i need this data.

Hi, yeah sure. I have the last data available, I can share it, do you want to give me your email?

Of course, [email protected]

kougeo25 avatar Nov 04 '24 19:11 kougeo25

Hi @FJFuture and @kougeo25 , unfortunately I'm not able to run the code, despite downloading everything, I'm running in several issues, but don't have the knowledge to fix them. Maybe, you could help me out with a copy of the dataset too? Best would be for the first version (07.08.), but of course I'll take any available.

My e-mail is: [email protected]

thanks a lot in advance to either for helping me! :)

Korbsen avatar Jan 20 '25 20:01 Korbsen

Hi @FJFuture and @kougeo25 , unfortunately I'm not able to run the code, despite downloading everything, I'm running in several issues, but don't have the knowledge to fix them. Maybe, you could help me out with a copy of the dataset too? Best would be for the first version (07.08.), but of course I'll take any available.

My e-mail is: [email protected]

thanks a lot in advance to either for helping me! :)

I can make a copy, but also I can try help you to make it work so you can use it whenever you want. My discord is: brivalens, you can add me there if you want and we can keep talking about this.

I want to let a huge big up for Prashant Ghimire who created this awesome tool.

FJFuture avatar Jan 23 '25 14:01 FJFuture

Hi friend @FJFuture, I want to do some analysis and prediction using this dataset, but I don't know much about web scraping. Could you please send me the FC25 dataset?

This is my email:[email protected]

Much thanks

qianbai688 avatar Jan 24 '25 06:01 qianbai688

Hi friend @FJFuture, I want to do some analysis and prediction using this dataset, but I don't know much about web scraping. Could you please send me the FC25 dataset?

This is my email:[email protected]

Much thanks

Ok, I can do that. What database do you need? The latest one?

FJFuture avatar Jan 24 '25 12:01 FJFuture

The latest one is OK.

By the way, the data is used for my undergraduate graduation project. So if you would like to tell me your name, I'll mention your help in my paper acknowledgements.

Sent! Sorry for the delay

FJFuture avatar Jan 29 '25 16:01 FJFuture

First, here's a Dockerfile for getting it to run with older node/npm versions

FROM node:18.12.1

RUN npm install -g [email protected]

WORKDIR /app

COPY package*.json ./
RUN npm install

COPY . .

CMD ["sh", "-c", "if [ \"$RUN_FULL\" = \"true\" ]; then npm run download-urls && npm run full; else npm run test; fi"]

and then to build

docker build -t sofifa-web-scraper .

and to run the test

docker run --rm sofifa-web-scraper

and to run the full thing:

docker run --rm -e RUN_FULL=true sofifa-web-scraper

Second, if you want to get the "newest" data only, you don't need the trailing integer on each URL. By default, without a trailing integer, the server returns the most recent.

You can use in-place editing of the URL files before running, with sed (Mac OS version)

sed -i '' -E 's|(https://sofifa\.com/player/[^/]+/[^/]+)/[0-9]+/|\1/|g' files/player-urls-test.csv
sed -i '' -E 's|(https://sofifa\.com/player/[^/]+/[^/]+)/[0-9]+/|\1/|g' files/player-urls-full.csv

You need to rebuild the container if you change any files

docker build -t sofifa-web-scraper .

This all works, until this error is thrown:

/app/services/scraper.js:28
    throw new Error(`Error reading page=${url}, statusCode=${response.statusCode}`);
                                                             ^

ReferenceError: response is not defined
    at getPageContent (/app/services/scraper.js:28:62)
    at async getPlayerDetailsCsvRow (/app/services/parser.js:22:18)
    at async download (/app/main.js:23:19)
    at async start (/app/main.js:36:9)

Node.js v18.12.1

philshem avatar Mar 02 '25 12:03 philshem

Hi there, Indeed there is a slight mistake in the code that doesn't handle errors well since the response is defined inside the while loop above and accessed outside of it when error occurs. This can be easily fixed by defining a lastResponse variable outside of the loop and updating it whenever a response is received. This way, the error handling will have access to the last response status code.

const getPageContent = async (url) => {
    let attempts = 5;
    let lastResponse = null;
    while (attempts > 0) {
        const response = await humanoid.get(url);
        // rest of the code
     }
    throw new Error(`Error reading page=${url}, statusCode=${lastResponse.statusCode}`);
};

This should return the error, try it and let us know what you get :)

SolideSpoke avatar Jun 13 '25 10:06 SolideSpoke

@philshem @SolideSpoke @FJFuture @qianbai688 Please check the latest Playwright implementation. This has been more effective against the the errors folks were facing.

prashantghimire avatar Oct 11 '25 00:10 prashantghimire