Andrea Francis Soria Jimenez

Results 30 comments of Andrea Francis Soria Jimenez

After doing some cache maintenance actions manually (removing obsolete records which config or split no longer exist) this is the updated list mostly AttributeError and ClientResponseError reduced: ``` [ {...

Update of UnexpectedErrors count by kind: ``` db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kindkind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kindkind: 'config-parquet-and-info' }, count: 9117 }, {...

Updated list of UnexpectedErrors by kind: ``` [ { _id: { kindkind: 'config-parquet-and-info' }, count: 8500 }, { _id: { kindkind: 'split-descriptive-statistics' }, count: 2628 }, { _id: { kindkind:...

Today: `Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) ` ``` [ { _id: { kind: 'config-parquet-and-info' }, count: 6215 }, {...

Today: ``` Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'config-parquet-and-info' }, count: 7373 }, { _id:...

Today: ``` db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'config-parquet-and-info' }, count: 6668 }, { _id: { kind: 'split-descriptive-statistics' },...

After refreshing some records: ``` Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'split-duckdb-index' }, count: 1380...

Today (Almost half of yesterday's): ``` Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'split-duckdb-index' }, count:...

Now that the index file has been removed, I see another issue: ![image](https://github.com/huggingface/datasets-server/assets/5564745/a15f3dd2-3b9f-4fd5-b91a-998369738e64) `ValueError: Directory name did not appear to be a partition: v1.1.0`

https://github.com/huggingface/dataset-viewer/pull/2928 will add a specific stemmer for a dataset only if it is marked as monolingual. (That is, only one language for all splits). But there are some caveats as:...