ipex-llm Umbrella issue of improving Orca user experience (update from time to time)

Umbrella issue of improving Orca user experience (update from time to time)

Open hkvision opened this issue 2 years ago • 0 comments

[General]

[ ] init_orca_context default cores=2, and users may not be aware of this setting and may just call it with default parameters. And in this case if user compare the local performance between the original TF/PyTorch script with the Orca version, they will find much performance drop since Orca is only using two cores while the original script will use all cores in most cases. Shall we change default cores to "*"?
[ ] The setting of default OMP related environment variables, related issues: https://github.com/intel-analytics/BigDL/issues/4370 https://github.com/intel-analytics/BigDL/issues/4372
[ ] https://github.com/intel-analytics/BigDL/issues/4538
[x] https://github.com/intel-analytics/BigDL/issues/4984
[ ] https://github.com/intel-analytics/BigDL/issues/4540
[ ] https://github.com/intel-analytics/BigDL/issues/4913 This may be of relatively low priority, we can probably don't support this case?

[Customized data and train]

[ ] For non-standard/complicated train loops, users may not be able to easily migrate to Orca, related issue: https://github.com/intel-analytics/BigDL/issues/3557
[ ] Some further preprocess steps after creating PyTorch DataLoader is not supported, related issue: https://github.com/intel-analytics/BigDL/issues/4410
[ ] Customized metrics is not supported for evaluation, for example: https://github.com/intel-analytics/BigDL/issues/4414
[ ] Sometimes loss is computed in the forward pass and we may set the loss_creator optional? Related issue: https://github.com/intel-analytics/BigDL/issues/4412
[ ] PyTorch Training Operator train_batch and forward_batch can be merged to avoid duplicate code.
[ ] For the above issues in this section, fixes can only be applied to ray and pyspark backend, bigdl backend has even less possibility to support those cases.

[Customer1 code specific issues]

[ ] Support multiple output and multiple loss functions (MMoE for multi-task learning)
[ ] Support PyTorch Lightning models.
[ ] https://github.com/intel-analytics/BigDL/issues/4448
[x] https://github.com/intel-analytics/BigDL/issues/4468
[ ] https://github.com/intel-analytics/BigDL/issues/4476

cc @jason-dai @shane-huang

Apr 08 '22 02:04 hkvision