Pruning Method | Description | Benefits |
---|---|---|
Magnitude Pruning | Removes weights with the smallest magnitudes. | Simplicity, speed improvement |
L1/L2 Regularization | Adds penalty to loss function based on weight magnitudes. | Generalizability, built-in pruning effect |
Lottery Ticket Hypothesis | Identifies winning subnetworks early in training and prunes others. | Finds efficient subnetworks, potentially better accuracy |
Pruning Method | Description | Benefits |
---|---|---|
Magnitude Pruning | Removes weights with the smallest magnitudes. | Simplicity, speed improvement |
L1/L2 Regularization | Adds penalty to loss function based on weight magnitudes. | Generalizability, built-in pruning effect |
Lottery Ticket Hypothesis | Identifies winning subnetworks early in training and prunes others. | Finds efficient subnetworks, potentially better accuracy |
Quantization Type | Description | Trade-offs |
---|---|---|
FP32 | 32-bit floating point precision | High accuracy, large model size, slower inference |
FP16 | 16-bit floating point precision | Good accuracy, reduced model size, faster inference |
INT8 | 8-bit integer precision | Reduced accuracy (may require retraining or calibration), smallest model size, fastest inference |
Quantization Type | Description | Trade-offs |
---|---|---|
FP32 | 32-bit floating point precision | High accuracy, large model size, slower inference |
FP16 | 16-bit floating point precision | Good accuracy, reduced model size, faster inference |
INT8 | 8-bit integer precision | Reduced accuracy (may require retraining or calibration), smallest model size, fastest inference |
Optimization Technique | Description | Benefits |
---|---|---|
Asynchronous Loading | Load data in a separate thread from the training loop. | Reduces overall training time. |
Prefetching | Load next batch of data while current batch is being processed. | Hides data loading latency. |
Caching | Store frequently used data in memory. | Reduces I/O operations. |
Optimization Technique | Description | Benefits |
---|---|---|
Asynchronous Loading | Load data in a separate thread from the training loop. | Reduces overall training time. |
Prefetching | Load next batch of data while current batch is being processed. | Hides data loading latency. |
Caching | Store frequently used data in memory. | Reduces I/O operations. |
Parallelism Type | Description | Use Cases |
---|---|---|
Data Parallelism | Distribute data across multiple GPUs, each with a copy of the model. | Suitable for large datasets, relatively simple to implement. |
Model Parallelism | Split the model across multiple GPUs. | Suitable for extremely large models that don't fit on a single GPU. |
Pipeline Parallelism | Split the execution of a model into stages and assign each stage to a different GPU. | Improves throughput by overlapping computation of different stages. |
Parallelism Type | Description | Use Cases |
---|---|---|
Data Parallelism | Distribute data across multiple GPUs, each with a copy of the model. | Suitable for large datasets, relatively simple to implement. |
Model Parallelism | Split the model across multiple GPUs. | Suitable for extremely large models that don't fit on a single GPU. |
Pipeline Parallelism | Split the execution of a model into stages and assign each stage to a different GPU. | Improves throughput by overlapping computation of different stages. |