Krotos Modules 3
Loading...
Searching...
No Matches
ggml.h
Go to the documentation of this file.
1#pragma once
2
3//
4// GGML Tensor Library
5//
6// This documentation is still a work in progress.
7// If you wish some specific topics to be covered, feel free to drop a comment:
8//
9// https://github.com/ggerganov/whisper.cpp/issues/40
10//
11// ## Overview
12//
13// This library implements:
14//
15// - a set of tensor operations
16// - automatic differentiation
17// - basic optimization algorithms
18//
19// The aim of this library is to provide a minimalistic approach for various machine learning tasks. This includes,
20// but is not limited to, the following:
21//
22// - linear regression
23// - support vector machines
24// - neural networks
25//
26// The library allows the user to define a certain function using the available tensor operations. This function
27// definition is represented internally via a computation graph. Each tensor operation in the function definition
28// corresponds to a node in the graph. Having the computation graph defined, the user can choose to compute the
29// function's value and/or its gradient with respect to the input variables. Optionally, the function can be optimized
30// using one of the available optimization algorithms.
31//
32// For example, here we define the function: f(x) = a*x^2 + b
33//
34// {
35// struct ggml_init_params params = {
36// .mem_size = 16*1024*1024,
37// .mem_buffer = NULL,
38// };
39//
40// // memory allocation happens here
41// struct ggml_context * ctx = ggml_init(params);
42//
43// struct ggml_tensor * x = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
44//
45// ggml_set_param(ctx, x); // x is an input variable
46//
47// struct ggml_tensor * a = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
48// struct ggml_tensor * b = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
49// struct ggml_tensor * x2 = ggml_mul(ctx, x, x);
50// struct ggml_tensor * f = ggml_add(ctx, ggml_mul(ctx, a, x2), b);
51//
52// ...
53// }
54//
55// Notice that the function definition above does not involve any actual computation. The computation is performed only
56// when the user explicitly requests it. For example, to compute the function's value at x = 2.0:
57//
58// {
59// ...
60//
61// struct ggml_cgraph gf = ggml_build_forward(f);
62//
63// // set the input variable and parameter values
64// ggml_set_f32(x, 2.0f);
65// ggml_set_f32(a, 3.0f);
66// ggml_set_f32(b, 4.0f);
67//
68// ggml_graph_compute(ctx0, &gf);
69//
70// printf("f = %f\n", ggml_get_f32_1d(f, 0));
71//
72// ...
73// }
74//
75// The actual computation is performed in the ggml_graph_compute() function.
76//
77// The ggml_new_tensor_...() functions create new tensors. They are allocated in the memory buffer provided to the
78// ggml_init() function. You have to be careful not to exceed the memory buffer size. Therefore, you have to know
79// in advance how much memory you need for your computation. Alternatively, you can allocate a large enough memory
80// and after defining the computation graph, call the ggml_used_mem() function to find out how much memory was
81// actually needed.
82//
83// The ggml_set_param() function marks a tensor as an input variable. This is used by the automatic
84// differentiation and optimization algorithms.
85//
86// The described approach allows to define the function graph once and then compute its forward or backward graphs
87// multiple times. All computations will use the same memory buffer allocated in the ggml_init() function. This way
88// the user can avoid the memory allocation overhead at runtime.
89//
90// The library supports multi-dimensional tensors - up to 4 dimensions. The FP16 and FP32 data types are first class
91// citizens, but in theory the library can be extended to support FP8 and integer data types.
92//
93// Each tensor operation produces a new tensor. Initially the library was envisioned to support only the use of unary
94// and binary operations. Most of the available operations fall into one of these two categories. With time, it became
95// clear that the library needs to support more complex operations. The way to support these operations is not clear
96// yet, but a few examples are demonstrated in the following operations:
97//
98// - ggml_permute()
99// - ggml_conv_1d_1s()
100// - ggml_conv_1d_2s()
101//
102// For each tensor operator, the library implements a forward and backward computation function. The forward function
103// computes the output tensor value given the input tensor values. The backward function computes the adjoint of the
104// input tensors given the adjoint of the output tensor. For a detailed explanation of what this means, take a
105// calculus class, or watch the following video:
106//
107// What is Automatic Differentiation?
108// https://www.youtube.com/watch?v=wG_nF1awSSY
109//
110//
111// ## Tensor data (struct ggml_tensor)
112//
113// The tensors are stored in memory via the ggml_tensor struct. The structure provides information about the size of
114// the tensor, the data type, and the memory buffer where the tensor data is stored. Additionally, it contains
115// pointers to the "source" tensors - i.e. the tensors that were used to compute the current tensor. For example:
116//
117// {
118// struct ggml_tensor * c = ggml_add(ctx, a, b);
119//
120// assert(c->src[0] == a);
121// assert(c->src[1] == b);
122// }
123//
124// The multi-dimensional tensors are stored in row-major order. The ggml_tensor struct contains fields for the
125// number of elements in each dimension ("ne") as well as the number of bytes ("nb", a.k.a. stride). This allows
126// to store tensors that are not contiguous in memory, which is useful for operations such as transposition and
127// permutation. All tensor operations have to take the stride into account and not assume that the tensor is
128// contiguous in memory.
129//
130// The data of the tensor is accessed via the "data" pointer. For example:
131//
132// {
133// struct ggml_tensor * a = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 2, 3);
134//
135// // a[1, 2] = 1.0f;
136// *(float *) ((char *) a->data + 2*a->nb[1] + 1*a->nb[0]) = 1.0f;
137//
138// // a[2, 0] = 2.0f;
139// *(float *) ((char *) a->data + 0*a->nb[1] + 2*a->nb[0]) = 2.0f;
140//
141// ...
142// }
143//
144// Alternatively, there are helper functions, such as ggml_get_f32_1d() and ggml_set_f32_1d() that can be used.
145//
146// ## The matrix multiplication operator (ggml_mul_mat)
147//
148// TODO
149//
150//
151// ## Multi-threading
152//
153// TODO
154//
155//
156// ## Overview of ggml.c
157//
158// TODO
159//
160//
161// ## SIMD optimizations
162//
163// TODO
164//
165//
166// ## Debugging ggml
167//
168// TODO
169//
170//
171
172#ifdef GGML_SHARED
173#if defined(_WIN32) && !defined(__MINGW32__)
174#ifdef GGML_BUILD
175#define GGML_API __declspec(dllexport)
176#else
177#define GGML_API __declspec(dllimport)
178#endif
179#else
180#define GGML_API __attribute__((visibility("default")))
181#endif
182#else
183#define GGML_API
184#endif
185
186#include <stdint.h>
187#include <stddef.h>
188#include <stdbool.h>
189
190#define GGML_FILE_MAGIC 0x67676d6c // "ggml"
191#define GGML_FILE_VERSION 1
192
193#define GGML_MAX_DIMS 4
194#define GGML_MAX_NODES 4096
195#define GGML_MAX_PARAMS 16
196#define GGML_MAX_CONTEXTS 64
197#define GGML_MAX_OPT 4
198#define GGML_DEFAULT_N_THREADS 4
199
200#define GGML_ASSERT(x) \
201 do \
202 { \
203 if (!(x)) \
204 { \
205 fprintf(stderr, "GGML_ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \
206 abort(); \
207 } \
208 } while (0)
209
210#ifdef __cplusplus
211extern "C"
212{
213#endif
214
215#ifdef __ARM_NEON
216 // we use the built-in 16-bit float type
217 typedef __fp16 ggml_fp16_t;
218#else
219typedef uint16_t ggml_fp16_t;
220#endif
221
222 // convert FP16 <-> FP32
225
226 GGML_API void ggml_fp16_to_fp32_row(const ggml_fp16_t* x, float* y, size_t n);
227 GGML_API void ggml_fp32_to_fp16_row(const float* x, ggml_fp16_t* y, size_t n);
228
229 struct ggml_object;
230 struct ggml_context;
231
249
250 // model file types
252 {
255 GGML_FTYPE_MOSTLY_F16 = 1, // except 1d tensors
256 GGML_FTYPE_MOSTLY_Q4_0 = 2, // except 1d tensors
257 GGML_FTYPE_MOSTLY_Q4_1 = 3, // except 1d tensors
258 GGML_FTYPE_MOSTLY_Q4_1_SOME_F16 = 4, // tok_embeddings.weight and output.weight are F16
259 GGML_FTYPE_MOSTLY_Q4_2 = 5, // except 1d tensors
260 GGML_FTYPE_MOSTLY_Q8_0 = 7, // except 1d tensors
261 GGML_FTYPE_MOSTLY_Q5_0 = 8, // except 1d tensors
262 GGML_FTYPE_MOSTLY_Q5_1 = 9, // except 1d tensors
263 };
264
265 // available tensor operations:
315
316 // ggml object
318 {
319 size_t offs;
320 size_t size;
321
323
324 char padding[8];
325 };
326
327 static const size_t GGML_OBJECT_SIZE = sizeof(struct ggml_object);
328
329 // n-dimensional tensor
331 {
333
335 int64_t ne[GGML_MAX_DIMS]; // number of elements
336 size_t nb[GGML_MAX_DIMS]; // stride in bytes:
337 // nb[0] = sizeof(type)
338 // nb[1] = nb[0] * ne[0] + padding
339 // nb[i] = nb[i-1] * ne[i-1]
340
341 // compute data
343
345
350
351 // thread scheduling
353
354 // performance
356 int64_t perf_cycles;
358
359 void* data;
360
361 char name[32];
362
363 char padding[8]; // TODO: remove and add padding to name?
364 };
365
366 // computation graph
368 {
372
373 size_t work_size;
375
379
380 // performance
382 int64_t perf_cycles;
384 };
385
386 // scratch buffer
388 {
389 size_t offs;
390 size_t size;
391 void* data;
392 };
393
395 {
396 // memory pool
397 size_t mem_size; // bytes
398 void* mem_buffer; // if NULL, memory will be allocated internally
399 bool no_alloc; // don't allocate memory for the tensor data
400 };
401
402 // misc
403
404 GGML_API void ggml_time_init(void); // call this once at the beginning of the program
405 GGML_API int64_t ggml_time_ms(void);
406 GGML_API int64_t ggml_time_us(void);
407 GGML_API int64_t ggml_cycles(void);
409
410 GGML_API void ggml_print_object(const struct ggml_object* obj);
411 GGML_API void ggml_print_objects(const struct ggml_context* ctx);
412
413 GGML_API int64_t ggml_nelements(const struct ggml_tensor* tensor);
414 GGML_API size_t ggml_nbytes(const struct ggml_tensor* tensor);
415
417 GGML_API size_t ggml_type_size(enum ggml_type type); // size in bytes for all elements in a block
418 GGML_API float ggml_type_sizef(enum ggml_type type); // ggml_type_size()/ggml_blck_size() as float
419
420 GGML_API const char* ggml_type_name(enum ggml_type type);
421
422 GGML_API size_t ggml_element_size(const struct ggml_tensor* tensor);
423
425
426 // TODO: temporary until model loading of ggml examples is refactored
428
429 // main
430
431 GGML_API struct ggml_context* ggml_init(struct ggml_init_params params);
432 GGML_API void ggml_free(struct ggml_context* ctx);
433
434 GGML_API size_t ggml_used_mem(const struct ggml_context* ctx);
435
436 GGML_API size_t ggml_set_scratch(struct ggml_context* ctx, struct ggml_scratch scratch);
437
438 GGML_API struct ggml_tensor* ggml_new_tensor(struct ggml_context* ctx, enum ggml_type type, int n_dims,
439 const int64_t* ne);
440
441 GGML_API struct ggml_tensor* ggml_new_tensor_1d(struct ggml_context* ctx, enum ggml_type type, int64_t ne0);
442
443 GGML_API struct ggml_tensor* ggml_new_tensor_2d(struct ggml_context* ctx, enum ggml_type type, int64_t ne0,
444 int64_t ne1);
445
446 GGML_API struct ggml_tensor* ggml_new_tensor_3d(struct ggml_context* ctx, enum ggml_type type, int64_t ne0,
447 int64_t ne1, int64_t ne2);
448
449 GGML_API struct ggml_tensor* ggml_new_tensor_4d(struct ggml_context* ctx, enum ggml_type type, int64_t ne0,
450 int64_t ne1, int64_t ne2, int64_t ne3);
451
452 GGML_API struct ggml_tensor* ggml_new_i32(struct ggml_context* ctx, int32_t value);
453 GGML_API struct ggml_tensor* ggml_new_f32(struct ggml_context* ctx, float value);
454
455 GGML_API struct ggml_tensor* ggml_dup_tensor(struct ggml_context* ctx, const struct ggml_tensor* src);
456 GGML_API struct ggml_tensor* ggml_view_tensor(struct ggml_context* ctx, const struct ggml_tensor* src);
457
459 GGML_API struct ggml_tensor* ggml_set_i32(struct ggml_tensor* tensor, int32_t value);
460 GGML_API struct ggml_tensor* ggml_set_f32(struct ggml_tensor* tensor, float value);
461
462 GGML_API int32_t ggml_get_i32_1d(const struct ggml_tensor* tensor, int i);
463 GGML_API void ggml_set_i32_1d(const struct ggml_tensor* tensor, int i, int32_t value);
464
465 GGML_API float ggml_get_f32_1d(const struct ggml_tensor* tensor, int i);
466 GGML_API void ggml_set_f32_1d(const struct ggml_tensor* tensor, int i, float value);
467
468 GGML_API void* ggml_get_data(const struct ggml_tensor* tensor);
469 GGML_API float* ggml_get_data_f32(const struct ggml_tensor* tensor);
470
471 GGML_API const char* ggml_get_name(const struct ggml_tensor* tensor);
472 GGML_API void ggml_set_name(struct ggml_tensor* tensor, const char* name);
473
474 //
475 // operations on tensors with backpropagation
476 //
477
478 GGML_API struct ggml_tensor* ggml_dup(struct ggml_context* ctx, struct ggml_tensor* a);
479
480 GGML_API struct ggml_tensor* ggml_add(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
481
482 GGML_API struct ggml_tensor* ggml_add_inplace(struct ggml_context* ctx, struct ggml_tensor* a,
483 struct ggml_tensor* b);
484
485 GGML_API struct ggml_tensor* ggml_sub(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
486
487 GGML_API struct ggml_tensor* ggml_mul(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
488
489 GGML_API struct ggml_tensor* ggml_div(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
490
491 GGML_API struct ggml_tensor* ggml_sqr(struct ggml_context* ctx, struct ggml_tensor* a);
492
493 GGML_API struct ggml_tensor* ggml_sqrt(struct ggml_context* ctx, struct ggml_tensor* a);
494
495 // return scalar
496 // TODO: compute sum along rows
497 GGML_API struct ggml_tensor* ggml_sum(struct ggml_context* ctx, struct ggml_tensor* a);
498
499 // mean along rows
500 GGML_API struct ggml_tensor* ggml_mean(struct ggml_context* ctx, struct ggml_tensor* a);
501
502 // if a is the same shape as b, and a is not parameter, return a
503 // otherwise, return a new tensor: repeat(a) to fit in b
504 GGML_API struct ggml_tensor* ggml_repeat(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
505
506 GGML_API struct ggml_tensor* ggml_abs(struct ggml_context* ctx, struct ggml_tensor* a);
507
508 GGML_API struct ggml_tensor* ggml_sgn(struct ggml_context* ctx, struct ggml_tensor* a);
509
510 GGML_API struct ggml_tensor* ggml_neg(struct ggml_context* ctx, struct ggml_tensor* a);
511
512 GGML_API struct ggml_tensor* ggml_step(struct ggml_context* ctx, struct ggml_tensor* a);
513
514 GGML_API struct ggml_tensor* ggml_relu(struct ggml_context* ctx, struct ggml_tensor* a);
515
516 // TODO: double-check this computation is correct
517 GGML_API struct ggml_tensor* ggml_gelu(struct ggml_context* ctx, struct ggml_tensor* a);
518
519 GGML_API struct ggml_tensor* ggml_silu(struct ggml_context* ctx, struct ggml_tensor* a);
520
521 // normalize along rows
522 // TODO: eps is hardcoded to 1e-5 for now
523 GGML_API struct ggml_tensor* ggml_norm(struct ggml_context* ctx, struct ggml_tensor* a);
524
525 GGML_API struct ggml_tensor* ggml_rms_norm(struct ggml_context* ctx, struct ggml_tensor* a);
526
527 // A: m rows, n columns
528 // B: p rows, n columns (i.e. we transpose it internally)
529 // result is m columns, p rows
530 GGML_API struct ggml_tensor* ggml_mul_mat(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
531
532 //
533 // operations on tensors without backpropagation
534 //
535
536 // in-place, returns view(a)
537 GGML_API struct ggml_tensor* ggml_scale(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
538
539 // a -> b, return view(b)
540 GGML_API struct ggml_tensor* ggml_cpy(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
541
542 // make contiguous
543 GGML_API struct ggml_tensor* ggml_cont(struct ggml_context* ctx, struct ggml_tensor* a);
544
545 // return view(a), b specifies the new shape
546 // TODO: when we start computing gradient, make a copy instead of view
547 GGML_API struct ggml_tensor* ggml_reshape(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
548
549 // return view(a)
550 // TODO: when we start computing gradient, make a copy instead of view
551 GGML_API struct ggml_tensor* ggml_reshape_2d(struct ggml_context* ctx, struct ggml_tensor* a, int64_t ne0,
552 int64_t ne1);
553
554 // return view(a)
555 // TODO: when we start computing gradient, make a copy instead of view
556 GGML_API struct ggml_tensor* ggml_reshape_3d(struct ggml_context* ctx, struct ggml_tensor* a, int64_t ne0,
557 int64_t ne1, int64_t ne2);
558
559 // offset in bytes
560 GGML_API struct ggml_tensor* ggml_view_1d(struct ggml_context* ctx, struct ggml_tensor* a, int64_t ne0,
561 size_t offset);
562
563 GGML_API struct ggml_tensor* ggml_view_2d(struct ggml_context* ctx, struct ggml_tensor* a, int64_t ne0, int64_t ne1,
564 size_t nb1, // row stride in bytes
565 size_t offset);
566
567 GGML_API struct ggml_tensor* ggml_view_3d(struct ggml_context* ctx, struct ggml_tensor* a, int64_t ne0, int64_t ne1,
568 int64_t ne2,
569 size_t nb1, // row stride in bytes
570 size_t nb2, // slice stride in bytes
571 size_t offset);
572
573 GGML_API struct ggml_tensor* ggml_permute(struct ggml_context* ctx, struct ggml_tensor* a, int axis0, int axis1,
574 int axis2, int axis3);
575
576 // alias for ggml_permute(ctx, a, 1, 0, 2, 3)
577 GGML_API struct ggml_tensor* ggml_transpose(struct ggml_context* ctx, struct ggml_tensor* a);
578
579 GGML_API struct ggml_tensor* ggml_get_rows(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b);
580
581 // set elements above the diagonal to -INF
582 // in-place, returns view(a)
583 GGML_API struct ggml_tensor* ggml_diag_mask_inf(struct ggml_context* ctx, struct ggml_tensor* a, int n_past);
584
585 // in-place, returns view(a)
586 GGML_API struct ggml_tensor* ggml_soft_max(struct ggml_context* ctx, struct ggml_tensor* a);
587
588 // rotary position embedding
589 // in-place, returns view(a)
590 // if mode & 1 == 1, skip n_past elements
591 // if mode & 2 == 1, GPT-NeoX style
592 // TODO: avoid creating a new tensor every time
593 GGML_API struct ggml_tensor* ggml_rope(struct ggml_context* ctx, struct ggml_tensor* a, int n_past, int n_dims,
594 int mode);
595
596 // alibi position embedding
597 // in-place, returns view(a)
598 struct ggml_tensor* ggml_alibi(struct ggml_context* ctx, struct ggml_tensor* a, int n_past, int n_head);
599
600 // padding = 1
601 // TODO: we don't support extra parameters for now
602 // that's why we are hard-coding the stride, padding, and dilation
603 // not great ..
604 GGML_API struct ggml_tensor* ggml_conv_1d_1s(struct ggml_context* ctx, struct ggml_tensor* a,
605 struct ggml_tensor* b);
606
607 GGML_API struct ggml_tensor* ggml_conv_1d_2s(struct ggml_context* ctx, struct ggml_tensor* a,
608 struct ggml_tensor* b);
609
610 GGML_API struct ggml_tensor* ggml_flash_attn(struct ggml_context* ctx, struct ggml_tensor* q, struct ggml_tensor* k,
611 struct ggml_tensor* v, bool masked);
612
613 GGML_API struct ggml_tensor* ggml_flash_ff(struct ggml_context* ctx, struct ggml_tensor* a, struct ggml_tensor* b0,
614 struct ggml_tensor* b1, struct ggml_tensor* c0, struct ggml_tensor* c1);
615
616 // Mapping operations
617 typedef void (*ggml_unary_op_f32_t)(const int, float*, const float*);
618 typedef void (*ggml_binary_op_f32_t)(const int, float*, const float*, const float*);
619
620 GGML_API struct ggml_tensor* ggml_map_unary_f32(struct ggml_context* ctx, struct ggml_tensor* a,
621 const ggml_unary_op_f32_t fun);
622
623 GGML_API struct ggml_tensor* ggml_map_binary_f32(struct ggml_context* ctx, struct ggml_tensor* a,
624 struct ggml_tensor* b, const ggml_binary_op_f32_t fun);
625
626 //
627 // automatic differentiation
628 //
629
630 GGML_API void ggml_set_param(struct ggml_context* ctx, struct ggml_tensor* tensor);
631
632 GGML_API void ggml_build_forward_expand(struct ggml_cgraph* cgraph, struct ggml_tensor* tensor);
633
635 GGML_API struct ggml_cgraph ggml_build_backward(struct ggml_context* ctx, struct ggml_cgraph* gf, bool keep);
636
637 GGML_API void ggml_graph_compute(struct ggml_context* ctx, struct ggml_cgraph* cgraph);
639
640 // print info and performance information for the graph
641 GGML_API void ggml_graph_print(const struct ggml_cgraph* cgraph);
642
643 // dump the graph into a file using the dot format
644 GGML_API void ggml_graph_dump_dot(const struct ggml_cgraph* gb, const struct ggml_cgraph* gf, const char* filename);
645
646 //
647 // optimization
648 //
649
650 // optimization methods
656
657 // linesearch methods
666
667 // optimization return values
682
683 // optimization parameters
684 //
685 // see ggml.c (ggml_opt_default_params) for default values
686 //
688 {
690
692
693 // delta-based convergence test
694 //
695 // if past == 0 - disabled
696 // if past > 0:
697 // stop if |f(x) - f(x_past)| < delta * max(1, |f(x)|)
698 //
699 int past;
700 float delta;
701
702 // maximum number of iterations without improvement
703 //
704 // if 0 - disabled
705 // if > 0:
706 // assume convergence if no cost improvement in this number of iterations
707 //
709
712
713 // ADAM parameters
714 struct
715 {
717
718 float alpha; // learning rate
719 float beta1;
720 float beta2;
721 float eps; // epsilon for numerical stability
722 float eps_f; // epsilon for convergence test
723 float eps_g; // epsilon for convergence test
725
726 // LBFGS parameters
727 struct
728 {
729 int m; // number of corrections to approximate the inv. Hessian
730 int n_iter;
732
733 float eps; // convergence tolerance
734 float ftol; // line search tolerance
735 float wolfe;
736 float min_step;
737 float max_step;
738
741 };
742
744
745 // optimize the function defined by the tensor f
746 GGML_API enum ggml_opt_result ggml_opt(struct ggml_context* ctx, struct ggml_opt_params params,
747 struct ggml_tensor* f);
748
749 //
750 // quantization
751 //
752
753 GGML_API size_t ggml_quantize_q4_0(const float* src, void* dst, int n, int k, int64_t* hist);
754 GGML_API size_t ggml_quantize_q4_1(const float* src, void* dst, int n, int k, int64_t* hist);
755 GGML_API size_t ggml_quantize_q4_2(const float* src, void* dst, int n, int k, int64_t* hist);
756 GGML_API size_t ggml_quantize_q5_0(const float* src, void* dst, int n, int k, int64_t* hist);
757 GGML_API size_t ggml_quantize_q5_1(const float* src, void* dst, int n, int k, int64_t* hist);
758 GGML_API size_t ggml_quantize_q8_0(const float* src, void* dst, int n, int k, int64_t* hist);
759
760 GGML_API size_t ggml_quantize_chunk(enum ggml_type type, const float* src, void* dst, int start, int n,
761 int64_t* hist);
762
763 //
764 // system info
765 //
766
784
785 //
786 // Internal types and functions exposed for tests and benchmarks
787 //
788
789#ifdef __cplusplus
790 // restrict not standard in C++
791#define GGML_RESTRICT
792#else
793#define GGML_RESTRICT restrict
794#endif
795 typedef void (*dequantize_row_q_t)(const void* GGML_RESTRICT x, float* GGML_RESTRICT y, int k);
796 typedef void (*quantize_row_q_t)(const float* GGML_RESTRICT x, void* GGML_RESTRICT y, int k);
797 typedef void (*vec_dot_q_t)(const int n, float* GGML_RESTRICT s, const void* GGML_RESTRICT x,
798 const void* GGML_RESTRICT y);
799
809
811
812#ifdef __cplusplus
813}
814#endif
GGML_API void ggml_fp32_to_fp16_row(const float *x, ggml_fp16_t *y, size_t n)
GGML_API int ggml_cpu_has_f16c(void)
GGML_API int ggml_cpu_has_vsx(void)
GGML_API struct ggml_tensor * ggml_new_tensor_2d(struct ggml_context *ctx, enum ggml_type type, int64_t ne0, int64_t ne1)
struct ggml_tensor * ggml_alibi(struct ggml_context *ctx, struct ggml_tensor *a, int n_past, int n_head)
GGML_API void ggml_set_i32_1d(const struct ggml_tensor *tensor, int i, int32_t value)
GGML_API struct ggml_tensor * ggml_set_zero(struct ggml_tensor *tensor)
GGML_API size_t ggml_set_scratch(struct ggml_context *ctx, struct ggml_scratch scratch)
GGML_API struct ggml_tensor * ggml_map_unary_f32(struct ggml_context *ctx, struct ggml_tensor *a, const ggml_unary_op_f32_t fun)
GGML_API struct ggml_context * ggml_init(struct ggml_init_params params)
GGML_API int ggml_cpu_has_clblast(void)
GGML_API void * ggml_get_data(const struct ggml_tensor *tensor)
#define GGML_MAX_NODES
Definition ggml.h:194
GGML_API struct ggml_tensor * ggml_norm(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API void ggml_graph_reset(struct ggml_cgraph *cgraph)
GGML_API int ggml_cpu_has_neon(void)
GGML_API struct ggml_tensor * ggml_map_binary_f32(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b, const ggml_binary_op_f32_t fun)
void(* ggml_binary_op_f32_t)(const int, float *, const float *, const float *)
Definition ggml.h:618
GGML_API int ggml_cpu_has_avx512(void)
GGML_API size_t ggml_quantize_q4_0(const float *src, void *dst, int n, int k, int64_t *hist)
ggml_opt_result
Definition ggml.h:669
@ GGML_OPT_OK
Definition ggml.h:670
@ GGML_LINESEARCH_MINIMUM_STEP
Definition ggml.h:677
@ GGML_OPT_DID_NOT_CONVERGE
Definition ggml.h:671
@ GGML_OPT_INVALID_WOLFE
Definition ggml.h:673
@ GGML_OPT_NO_CONTEXT
Definition ggml.h:672
@ GGML_OPT_FAIL
Definition ggml.h:674
@ GGML_LINESEARCH_MAXIMUM_ITERATIONS
Definition ggml.h:679
@ GGML_LINESEARCH_MAXIMUM_STEP
Definition ggml.h:678
@ GGML_LINESEARCH_FAIL
Definition ggml.h:676
@ GGML_LINESEARCH_INVALID_PARAMETERS
Definition ggml.h:680
GGML_API ggml_fp16_t ggml_fp32_to_fp16(float x)
GGML_API struct ggml_tensor * ggml_reshape_2d(struct ggml_context *ctx, struct ggml_tensor *a, int64_t ne0, int64_t ne1)
GGML_API struct ggml_tensor * ggml_new_f32(struct ggml_context *ctx, float value)
GGML_API struct ggml_tensor * ggml_add_inplace(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API void ggml_free(struct ggml_context *ctx)
ggml_opt_type
Definition ggml.h:652
@ GGML_OPT_ADAM
Definition ggml.h:653
@ GGML_OPT_LBFGS
Definition ggml.h:654
GGML_API void ggml_print_object(const struct ggml_object *obj)
GGML_API const char * ggml_type_name(enum ggml_type type)
GGML_API size_t ggml_nbytes(const struct ggml_tensor *tensor)
ggml_op
Definition ggml.h:267
@ GGML_OP_MAP_UNARY
Definition ggml.h:310
@ GGML_OP_DUP
Definition ggml.h:270
@ GGML_OP_ROPE
Definition ggml.h:302
@ GGML_OP_CONT
Definition ggml.h:294
@ GGML_OP_COUNT
Definition ggml.h:313
@ GGML_OP_MUL_MAT
Definition ggml.h:290
@ GGML_OP_ALIBI
Definition ggml.h:303
@ GGML_OP_SILU
Definition ggml.h:286
@ GGML_OP_CPY
Definition ggml.h:293
@ GGML_OP_SQR
Definition ggml.h:275
@ GGML_OP_MEAN
Definition ggml.h:278
@ GGML_OP_VIEW
Definition ggml.h:296
@ GGML_OP_NONE
Definition ggml.h:268
@ GGML_OP_ABS
Definition ggml.h:280
@ GGML_OP_ADD
Definition ggml.h:271
@ GGML_OP_GET_ROWS
Definition ggml.h:299
@ GGML_OP_DIV
Definition ggml.h:274
@ GGML_OP_SUB
Definition ggml.h:272
@ GGML_OP_RMS_NORM
Definition ggml.h:288
@ GGML_OP_SGN
Definition ggml.h:281
@ GGML_OP_FLASH_ATTN
Definition ggml.h:307
@ GGML_OP_PERMUTE
Definition ggml.h:297
@ GGML_OP_MUL
Definition ggml.h:273
@ GGML_OP_FLASH_FF
Definition ggml.h:308
@ GGML_OP_RELU
Definition ggml.h:284
@ GGML_OP_NORM
Definition ggml.h:287
@ GGML_OP_CONV_1D_2S
Definition ggml.h:305
@ GGML_OP_STEP
Definition ggml.h:283
@ GGML_OP_SOFT_MAX
Definition ggml.h:301
@ GGML_OP_DIAG_MASK_INF
Definition ggml.h:300
@ GGML_OP_SCALE
Definition ggml.h:292
@ GGML_OP_TRANSPOSE
Definition ggml.h:298
@ GGML_OP_CONV_1D_1S
Definition ggml.h:304
@ GGML_OP_SQRT
Definition ggml.h:276
@ GGML_OP_GELU
Definition ggml.h:285
@ GGML_OP_REPEAT
Definition ggml.h:279
@ GGML_OP_NEG
Definition ggml.h:282
@ GGML_OP_SUM
Definition ggml.h:277
@ GGML_OP_RESHAPE
Definition ggml.h:295
@ GGML_OP_MAP_BINARY
Definition ggml.h:311
GGML_API size_t ggml_quantize_q5_0(const float *src, void *dst, int n, int k, int64_t *hist)
GGML_API int ggml_cpu_has_wasm_simd(void)
GGML_API size_t ggml_type_size(enum ggml_type type)
GGML_API struct ggml_tensor * ggml_get_rows(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_tensor * ggml_mul_mat(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API size_t ggml_quantize_q5_1(const float *src, void *dst, int n, int k, int64_t *hist)
GGML_API enum ggml_type ggml_ftype_to_ggml_type(enum ggml_ftype ftype)
static const size_t GGML_OBJECT_SIZE
Definition ggml.h:327
GGML_API void ggml_graph_compute(struct ggml_context *ctx, struct ggml_cgraph *cgraph)
GGML_API int ggml_cpu_has_fp16_va(void)
GGML_API int ggml_cpu_has_avx512_vnni(void)
GGML_API struct ggml_tensor * ggml_mul(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_tensor * ggml_neg(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_cpy(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API void ggml_set_param(struct ggml_context *ctx, struct ggml_tensor *tensor)
GGML_API size_t ggml_used_mem(const struct ggml_context *ctx)
GGML_API void ggml_time_init(void)
GGML_API struct ggml_tensor * ggml_rope(struct ggml_context *ctx, struct ggml_tensor *a, int n_past, int n_dims, int mode)
GGML_API struct ggml_tensor * ggml_view_1d(struct ggml_context *ctx, struct ggml_tensor *a, int64_t ne0, size_t offset)
GGML_API struct ggml_tensor * ggml_soft_max(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_sum(struct ggml_context *ctx, struct ggml_tensor *a)
#define GGML_RESTRICT
Definition ggml.h:793
GGML_API struct ggml_tensor * ggml_silu(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API int64_t ggml_time_us(void)
GGML_API int64_t ggml_cycles_per_ms(void)
GGML_API size_t ggml_element_size(const struct ggml_tensor *tensor)
ggml_type
Definition ggml.h:233
@ GGML_TYPE_Q4_2
Definition ggml.h:238
@ GGML_TYPE_Q8_1
Definition ggml.h:243
@ GGML_TYPE_F32
Definition ggml.h:234
@ GGML_TYPE_I16
Definition ggml.h:245
@ GGML_TYPE_Q5_0
Definition ggml.h:240
@ GGML_TYPE_I8
Definition ggml.h:244
@ GGML_TYPE_F16
Definition ggml.h:235
@ GGML_TYPE_Q4_1
Definition ggml.h:237
@ GGML_TYPE_Q8_0
Definition ggml.h:242
@ GGML_TYPE_I32
Definition ggml.h:246
@ GGML_TYPE_Q5_1
Definition ggml.h:241
@ GGML_TYPE_COUNT
Definition ggml.h:247
@ GGML_TYPE_Q4_0
Definition ggml.h:236
GGML_API float * ggml_get_data_f32(const struct ggml_tensor *tensor)
GGML_API int ggml_cpu_has_avx512_vbmi(void)
GGML_API struct ggml_tensor * ggml_gelu(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API int ggml_cpu_has_blas(void)
GGML_API struct ggml_tensor * ggml_sgn(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_mean(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_add(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API float ggml_get_f32_1d(const struct ggml_tensor *tensor, int i)
GGML_API struct ggml_tensor * ggml_dup_tensor(struct ggml_context *ctx, const struct ggml_tensor *src)
GGML_API enum ggml_opt_result ggml_opt(struct ggml_context *ctx, struct ggml_opt_params params, struct ggml_tensor *f)
GGML_API struct ggml_tensor * ggml_step(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_scale(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API void ggml_graph_print(const struct ggml_cgraph *cgraph)
#define GGML_MAX_DIMS
Definition ggml.h:193
GGML_API struct ggml_tensor * ggml_sqrt(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_repeat(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_tensor * ggml_sub(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_opt_params ggml_opt_default_params(enum ggml_opt_type type)
uint16_t ggml_fp16_t
Definition ggml.h:219
GGML_API struct ggml_tensor * ggml_view_tensor(struct ggml_context *ctx, const struct ggml_tensor *src)
GGML_API struct ggml_tensor * ggml_rms_norm(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API int ggml_cpu_has_avx(void)
GGML_API int64_t ggml_cycles(void)
void(* dequantize_row_q_t)(const void *GGML_RESTRICT x, float *GGML_RESTRICT y, int k)
Definition ggml.h:795
GGML_API struct ggml_cgraph ggml_build_forward(struct ggml_tensor *tensor)
GGML_API struct ggml_tensor * ggml_new_tensor(struct ggml_context *ctx, enum ggml_type type, int n_dims, const int64_t *ne)
GGML_API int64_t ggml_time_ms(void)
GGML_API int64_t ggml_nelements(const struct ggml_tensor *tensor)
GGML_API void ggml_print_objects(const struct ggml_context *ctx)
GGML_API struct ggml_tensor * ggml_flash_attn(struct ggml_context *ctx, struct ggml_tensor *q, struct ggml_tensor *k, struct ggml_tensor *v, bool masked)
GGML_API float ggml_fp16_to_fp32(ggml_fp16_t x)
GGML_API struct ggml_tensor * ggml_flash_ff(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b0, struct ggml_tensor *b1, struct ggml_tensor *c0, struct ggml_tensor *c1)
GGML_API struct ggml_tensor * ggml_dup(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_relu(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API size_t ggml_quantize_q4_1(const float *src, void *dst, int n, int k, int64_t *hist)
GGML_API void ggml_set_name(struct ggml_tensor *tensor, const char *name)
GGML_API int ggml_cpu_has_fma(void)
GGML_API int ggml_blck_size(enum ggml_type type)
GGML_API struct ggml_tensor * ggml_div(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API int ggml_cpu_has_avx2(void)
void(* vec_dot_q_t)(const int n, float *GGML_RESTRICT s, const void *GGML_RESTRICT x, const void *GGML_RESTRICT y)
Definition ggml.h:797
GGML_API struct ggml_tensor * ggml_conv_1d_2s(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_tensor * ggml_abs(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_transpose(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API struct ggml_tensor * ggml_new_tensor_4d(struct ggml_context *ctx, enum ggml_type type, int64_t ne0, int64_t ne1, int64_t ne2, int64_t ne3)
GGML_API struct ggml_tensor * ggml_set_i32(struct ggml_tensor *tensor, int32_t value)
GGML_API struct ggml_tensor * ggml_view_2d(struct ggml_context *ctx, struct ggml_tensor *a, int64_t ne0, int64_t ne1, size_t nb1, size_t offset)
GGML_API struct ggml_tensor * ggml_reshape(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_tensor * ggml_new_i32(struct ggml_context *ctx, int32_t value)
GGML_API int ggml_cpu_has_sse3(void)
GGML_API int32_t ggml_get_i32_1d(const struct ggml_tensor *tensor, int i)
GGML_API size_t ggml_quantize_chunk(enum ggml_type type, const float *src, void *dst, int start, int n, int64_t *hist)
GGML_API const char * ggml_get_name(const struct ggml_tensor *tensor)
ggml_ftype
Definition ggml.h:252
@ GGML_FTYPE_MOSTLY_Q4_1_SOME_F16
Definition ggml.h:258
@ GGML_FTYPE_MOSTLY_Q4_2
Definition ggml.h:259
@ GGML_FTYPE_MOSTLY_Q8_0
Definition ggml.h:260
@ GGML_FTYPE_UNKNOWN
Definition ggml.h:253
@ GGML_FTYPE_ALL_F32
Definition ggml.h:254
@ GGML_FTYPE_MOSTLY_Q4_0
Definition ggml.h:256
@ GGML_FTYPE_MOSTLY_Q4_1
Definition ggml.h:257
@ GGML_FTYPE_MOSTLY_Q5_0
Definition ggml.h:261
@ GGML_FTYPE_MOSTLY_F16
Definition ggml.h:255
@ GGML_FTYPE_MOSTLY_Q5_1
Definition ggml.h:262
GGML_API struct ggml_tensor * ggml_new_tensor_1d(struct ggml_context *ctx, enum ggml_type type, int64_t ne0)
GGML_API struct ggml_tensor * ggml_new_tensor_3d(struct ggml_context *ctx, enum ggml_type type, int64_t ne0, int64_t ne1, int64_t ne2)
void(* quantize_row_q_t)(const float *GGML_RESTRICT x, void *GGML_RESTRICT y, int k)
Definition ggml.h:796
GGML_API void ggml_graph_dump_dot(const struct ggml_cgraph *gb, const struct ggml_cgraph *gf, const char *filename)
GGML_API struct ggml_tensor * ggml_reshape_3d(struct ggml_context *ctx, struct ggml_tensor *a, int64_t ne0, int64_t ne1, int64_t ne2)
GGML_API bool ggml_is_quantized(enum ggml_type type)
quantize_fns_t ggml_internal_get_quantize_fn(size_t i)
GGML_API struct ggml_tensor * ggml_conv_1d_1s(struct ggml_context *ctx, struct ggml_tensor *a, struct ggml_tensor *b)
GGML_API struct ggml_tensor * ggml_permute(struct ggml_context *ctx, struct ggml_tensor *a, int axis0, int axis1, int axis2, int axis3)
GGML_API struct ggml_tensor * ggml_view_3d(struct ggml_context *ctx, struct ggml_tensor *a, int64_t ne0, int64_t ne1, int64_t ne2, size_t nb1, size_t nb2, size_t offset)
void(* ggml_unary_op_f32_t)(const int, float *, const float *)
Definition ggml.h:617
GGML_API struct ggml_cgraph ggml_build_backward(struct ggml_context *ctx, struct ggml_cgraph *gf, bool keep)
#define GGML_API
Definition ggml.h:183
GGML_API struct ggml_tensor * ggml_set_f32(struct ggml_tensor *tensor, float value)
GGML_API struct ggml_tensor * ggml_diag_mask_inf(struct ggml_context *ctx, struct ggml_tensor *a, int n_past)
GGML_API size_t ggml_quantize_q8_0(const float *src, void *dst, int n, int k, int64_t *hist)
GGML_API struct ggml_tensor * ggml_cont(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API void ggml_set_f32_1d(const struct ggml_tensor *tensor, int i, float value)
GGML_API float ggml_type_sizef(enum ggml_type type)
GGML_API void ggml_fp16_to_fp32_row(const ggml_fp16_t *x, float *y, size_t n)
GGML_API size_t ggml_quantize_q4_2(const float *src, void *dst, int n, int k, int64_t *hist)
GGML_API int ggml_cpu_has_gpublas(void)
GGML_API int ggml_cpu_has_cublas(void)
GGML_API struct ggml_tensor * ggml_sqr(struct ggml_context *ctx, struct ggml_tensor *a)
GGML_API int ggml_cpu_has_arm_fma(void)
ggml_linesearch
Definition ggml.h:659
@ GGML_LINESEARCH_BACKTRACKING_ARMIJO
Definition ggml.h:662
@ GGML_LINESEARCH_BACKTRACKING_STRONG_WOLFE
Definition ggml.h:664
@ GGML_LINESEARCH_DEFAULT
Definition ggml.h:660
@ GGML_LINESEARCH_BACKTRACKING_WOLFE
Definition ggml.h:663
#define GGML_MAX_OPT
Definition ggml.h:197
GGML_API void ggml_build_forward_expand(struct ggml_cgraph *cgraph, struct ggml_tensor *tensor)
Definition ggml.h:368
int n_leafs
Definition ggml.h:370
struct ggml_tensor * nodes[GGML_MAX_NODES]
Definition ggml.h:376
struct ggml_tensor * leafs[GGML_MAX_NODES]
Definition ggml.h:378
struct ggml_tensor * work
Definition ggml.h:374
size_t work_size
Definition ggml.h:373
int n_nodes
Definition ggml.h:369
struct ggml_tensor * grads[GGML_MAX_NODES]
Definition ggml.h:377
int64_t perf_cycles
Definition ggml.h:382
int64_t perf_time_us
Definition ggml.h:383
int perf_runs
Definition ggml.h:381
int n_threads
Definition ggml.h:371
Definition ggml.h:395
bool no_alloc
Definition ggml.h:399
size_t mem_size
Definition ggml.h:397
void * mem_buffer
Definition ggml.h:398
Definition ggml.h:318
char padding[8]
Definition ggml.h:324
struct ggml_object * next
Definition ggml.h:322
size_t size
Definition ggml.h:320
size_t offs
Definition ggml.h:319
Definition ggml.h:688
float delta
Definition ggml.h:700
bool print_forward_graph
Definition ggml.h:710
enum ggml_opt_type type
Definition ggml.h:689
int past
Definition ggml.h:699
bool print_backward_graph
Definition ggml.h:711
float ftol
Definition ggml.h:734
enum ggml_linesearch linesearch
Definition ggml.h:739
float wolfe
Definition ggml.h:735
struct ggml_opt_params::@1 adam
int max_no_improvement
Definition ggml.h:708
float alpha
Definition ggml.h:718
float min_step
Definition ggml.h:736
struct ggml_opt_params::@2 lbfgs
float beta2
Definition ggml.h:720
float eps
Definition ggml.h:721
int n_iter
Definition ggml.h:716
float eps_g
Definition ggml.h:723
float beta1
Definition ggml.h:719
int n_threads
Definition ggml.h:691
float eps_f
Definition ggml.h:722
int max_linesearch
Definition ggml.h:731
float max_step
Definition ggml.h:737
int m
Definition ggml.h:729
Definition ggml.h:388
size_t size
Definition ggml.h:390
void * data
Definition ggml.h:391
size_t offs
Definition ggml.h:389
Definition ggml.h:331
enum ggml_op op
Definition ggml.h:342
void * data
Definition ggml.h:359
size_t nb[GGML_MAX_DIMS]
Definition ggml.h:336
int64_t ne[GGML_MAX_DIMS]
Definition ggml.h:335
int n_dims
Definition ggml.h:334
int64_t perf_time_us
Definition ggml.h:357
int64_t perf_cycles
Definition ggml.h:356
enum ggml_type type
Definition ggml.h:332
struct ggml_tensor * opt[GGML_MAX_OPT]
Definition ggml.h:349
struct ggml_tensor * grad
Definition ggml.h:346
struct ggml_tensor * src0
Definition ggml.h:347
int n_tasks
Definition ggml.h:352
char name[32]
Definition ggml.h:361
char padding[8]
Definition ggml.h:363
int perf_runs
Definition ggml.h:355
struct ggml_tensor * src1
Definition ggml.h:348
bool is_param
Definition ggml.h:344
Definition ggml.h:801
dequantize_row_q_t dequantize_row_q
Definition ggml.h:802
vec_dot_q_t vec_dot_q
Definition ggml.h:806
quantize_row_q_t quantize_row_q_reference
Definition ggml.h:804
quantize_row_q_t quantize_row_q_dot
Definition ggml.h:805
quantize_row_q_t quantize_row_q
Definition ggml.h:803
enum ggml_type vec_dot_type
Definition ggml.h:807