Ryujinx

mirror of https://github.com/Ryujinx/Ryujinx.git synced 2024-10-01 12:30:00 +02:00

Author	SHA1	Message	Date
riperiperi	7c5ead1c19	Fast path for Inline2Memory buffer write that skips write tracking (#2624 ) * Fast path for Inline2Memory buffer write This PR adds a method to PhysicalMemory that attempts to write all cached resources directly, so that memory tracking can be avoided. The goal of this is both to avoid flushing buffer data, and to avoid raising the sequence number when data is written, which causes buffer and texture handles to be re-checked. This currently only targets buffers, with a side check on textures that falls back to a tracked write if any exist within the target range. It's not expected to write textures from here - this is just a mechanism to protect us if someone does decide to do that. It's possible to add a fast path for this in future (and for ShaderCache, once that starts using tracking) The forced read before inline2memory begins has been skipped, as the data is fully written when the transfer is completed anyways. This allows us to flush on read in emergency situations, but still write the new data over the flushed data. Improves performance on Xenoblade 2 and DE, which was flushing buffer data on the GPU thread when trying to write compute data. May improve performance in other games that write SSBOs from compute, and update data in the same/nearby pages often. Super Smash Bros Ultimate should probably be tested to make sure the vertex explosions haven't returned, as I think that's what this AdvanceSequence was for. * ForceDirty before write, to make sure data does not flush over the new write	2021-09-19 15:09:53 +02:00
gdkchan	f08a280ade	Use shader subgroup extensions if shader ballot is not supported (#2627 ) * Use shader subgroup extensions if shader ballot is not supported * Shader cache version bump + cleanup * The type is still required on the table	2021-09-19 14:38:39 +02:00
riperiperi	7379bc2f39	Array based RangeList that caches Address/EndAddress (#2642 ) * Array based RangeList that caches Address/EndAddress In isolation, this was more than 2x faster than the RangeList that checks using the interface. In practice I'm seeing much better results than I expected. The array is used because checking it is slightly faster than using a list, which loses time to struct copies, but I still want that data locality. A method has been added to the list to update the cached end address, as some users of the RangeList currently modify it dynamically. Greatly improves performance in Super Mario Odyssey, Xenoblade and any other GPU limited games. * Address Feedback	2021-09-19 14:22:26 +02:00
riperiperi	b0af010247	Set texture/image bindings in place rather than allocating and passing an array (#2647 ) * Remove allocations for texture bindings and state * Rent rather than stackalloc + copy A bit faster.	2021-09-19 14:03:05 +02:00
gdkchan	ac4ec1a015	Account for negative strides on DMA copy (#2623 ) * Account for negative strides on DMA copy * Should account for non-zero Y	2021-09-11 22:54:18 +02:00
riperiperi	b0e410a828	Lift textures in the AutoDeleteCache for all modifications. (#2615 ) * Lift textures in the AutoDeleteCache for all modifications. Before, this would only apply to render targets and texture blit. Now it applies to image stores, the fast dma copy path and any other type of modification. Image store always at least has one reference in the texture pool, so the function of the AutoDeleteCache keeping textures _alive_ is not useful, but a very important function for a while has been its use to flush textures in order of modification when they are dereferenced, so that their data is not lost. Before, textures populated using image stores were being dereferenced and reloaded as garbage. Now, when these textures are dereferenced, their data will be put back into memory, and everything stays intact. Fixes lighting breaking when switching levels in THPS1+2, and potentially some more UE4 games. I've tested a bunch more games for regressions and performance impact, but they all seem fine. * Lift copy srcTexture so that it doesn't remain referenceless * Perform lift before reference count change on unbind. It's important to lift on unbind as that is the moment the texture was truly last modified, but definitely not after releasing every single reference.	2021-09-11 21:52:54 +02:00
riperiperi	f0b00c1ae9	Fix TXQ for 3D textures. (#2613 ) * Fix TXQ for 3D textures. Assumes the texture is 3D if the component mask contains Z. This fixes a bug in UE4 games where parts of the map had garbage pointers to lighting voxels, as the lookup 3D texture was not being initialized. Most notable game is THPS1+2. May need another PR to keep image store data alive and properly flush it in order using the AutoDeleteCache. * Get sampler type for TextureSize from bound textures.	2021-09-02 00:17:43 -03:00
riperiperi	142cededd4	Implement Shader Instructions SUATOM and SURED (#2090 ) * Initial Implementation * Further improvements (no support for float/64-bit types) * Merge atomic and reduce instructions, add missing format switch * Fix rebase issues. * Not used. * Whoops. Fixed. * Partial implementation of inc/dec, cleanup and TODOs * Remove testing path * Address Feedback	2021-08-31 02:51:57 -03:00
gdkchan	416dc8fde4	Fix out-of-bounds shader thread shuffle (#2605 ) * Fix out-of-bounds shader thread shuffle * Shader cache version bump	2021-08-30 14:02:40 -03:00
gdkchan	82cefc8dd3	Handle indirect draw counts with non-zero draw starts properly (#2593 )	2021-08-29 16:52:38 -03:00
riperiperi	15e7fe3ac9	Avoid deleting textures when their data does not overlap. (#2601 ) * Avoid deleting textures when their data does not overlap. It's possible that while two textures start and end addresses indicate an overlap, that the actual data contained within them is sparse due to a layer stride. One such possibility is array slices of a cubemap at different mip levels - they overlap on a whole, but the actual texture data fills the gaps between each other's layers rather than actually overlapping. This fixes issues with UE4 games having incorrect lighting (solid white screen or really dark shadows). There are still remaining issues with games that use the 3D texture prebaked lighting, such as THPS1+2. This PR also fixes a bug with TexturePool's resized texture handling where the base level in the descriptor was not considered. * AllRegions granularity for 3d textures is now by level rather than by slice. * Address feedback	2021-08-29 16:22:13 -03:00
riperiperi	76e8f9ac87	Only reupload the texture scale array if it changes. (#2595 ) * Only reupload the texture scale array if it changes. Before, this would be called all the time if any shader needed a scale value. The cost of doing this has increased with threaded-gal, as the scale array is copied to a span pool, and it's was called on pretty much every draw sometimes. This improves GPU performance in games, scaled or not. Most affected game seems to be Xenoblade Chronicles: Definitive Edition. * Just use = instead of \|=	2021-08-27 17:08:30 -03:00
gdkchan	ee1038e542	Initial support for shader attribute indexing (#2546 ) * Initial support for shader attribute indexing * Support output indexing too, other improvements * Fix order * Address feedback	2021-08-27 01:44:47 +02:00
riperiperi	ec3e848d79	Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501 ) * Initial Implementation About as fast as nvidia GL multithreading, can be improved with faster command queuing. * Struct based command list Speeds up a bit. Still a lot of time lost to resource copy. * Do shader init while the render thread is active. * Introduce circular span pool V1 Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next. * Refactor SpanRef some more Use a struct to represent SpanRef, rather than a reference. * Flush buffers on background thread * Use a span for UpdateRenderScale. Much faster than copying the array. * Calculate command size using reflection * WIP parallel shaders * Some minor optimisation * Only 2 max refs per command now. The command with 3 refs is gone. 😌 * Don't cast on the GPU side * Remove redundant casts, force sync on window present * Fix Shader Cache * Fix host shader save. * Fixup to work with new renderer stuff * Make command Run static, use array of delegates as lookup Profile says this takes less time than the previous way. * Bring up to date * Add settings toggle. Fix Muiltithreading Off mode. * Fix warning. * Release tracking lock for flushes * Fix Conditional Render fast path with threaded gal * Make handle iteration safe when releasing the lock This is mostly temporary. * Attempt to set backend threading on driver Only really works on nvidia before launching a game. * Fix race condition with BufferModifiedRangeList, exceptions in tracking actions * Update buffer set commands * Some cleanup * Only use stutter workaround when using opengl renderer non-threaded * Add host-conditional reservation of counter events There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made. * Address Feedback * Make counter flush tracked again. Hopefully does not cause any issues this time. * Wait for FlushTo on the main queue thread. Currently assumes only one thread will want to FlushTo (in this case, the GPU thread) * Add SDL2 headless integration * Add HLE macro commands. Co-authored-by: Mary <mary@mary.zone>	2021-08-27 00:31:29 +02:00
mpnico	8e1adb95cf	Add support for HLE macros and accelerate MultiDrawElementsIndirectCount #2 (#2557 ) * Add support for HLE macros and accelerate MultiDrawElementsIndirectCount * Add missing barrier * Fix index buffer count * Add support check for each macro hle before use * Add missing xml doc Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-08-26 23:50:28 +02:00
riperiperi	bdc1f91a5b	Remove pool cache entries for incompatible overlapping textures (#2568 ) This greatly reduces memory usage in games that aggressively reuse memory without removing dead textures from the pool, such as the Xenoblade games, UE3 games, and to a lesser extent, UE4/unity games. This change stops memory usage from ballooning in xenoblade and some other games. It will also reduce texture view/dependency complexity in some games - for example in MK8D it will reduce the number of surface copies between lighting cubemaps generated for actors. There shouldn't be any performance impact from doing this, though the deletion and creation of textures could be improved by improving the OpenGL texture storage cache, which is very simple and limited right now. This will be improved in future. Another potential error has been fixed with the texture cache, which could prevent data loss when data is interchangably written to textures from both the GPU and CPU. It was possible that the dirty flag for a texture would be consumed without the data being synchronized on next use, due to the old overlap check. This check no longer consumes the dirty flag. Please test a bunch of games to make sure they still work, and there are no performance regressions.	2021-08-20 17:52:09 -03:00
riperiperi	97aedc030d	Fix GetHandleInformation for mipmapped 3d textures (#2569 ) Got this the wrong way round - was causing games to try synchronize mipmap levels of like 52 on a 3d texture with 6 levels. Also, corrected the variable name in the method that _was_ working.	2021-08-20 14:59:39 -03:00
gdkchan	680d3ed198	Enable transform feedback buffer flush (#2552 )	2021-08-17 14:09:27 -03:00
gdkchan	eb181425b1	Fix size of cached compute shaders (#2548 ) * Fix size of cached compute shaders * Missed one	2021-08-12 15:59:24 -03:00
gdkchan	8196086f7a	Revert "Calculate vertex buffer sizes from index buffer (#1663 )" (#2544 ) This reverts commit `10d649e6d3`.	2021-08-11 22:13:48 -03:00
gdkchan	3148c0c21c	Unify GpuAccessorBase and TextureDescriptorCapableGpuAccessor (#2542 ) * Unify GpuAccessorBase and TextureDescriptorCapableGpuAccessor * Shader cache version bump	2021-08-11 18:56:59 -03:00
gdkchan	c3e2646f9e	Workaround for Intel FrontFacing built-in variable bug (#2540 )	2021-08-11 23:01:06 +02:00
riperiperi	0a80a837cb	Use "Undesired" scale mode for certain textures rather than blacklisting (#2537 ) * Use "Undesired" scale mode for certain textures rather than blacklisting * Nit Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-08-11 22:44:51 +02:00
gdkchan	ed754af8d5	Make sure attributes used on subsequent shader stages are initialized (#2538 )	2021-08-11 22:27:00 +02:00
gdkchan	10d649e6d3	Calculate vertex buffer sizes from index buffer (#1663 ) * Calculate vertex buffer size from maximum index buffer index * Increase maximum index buffer count for it to be considered profitable for counting	2021-08-11 22:06:09 +02:00
gdkchan	0f6ec446ea	Replace BGRA and scale uniforms with a uniform block (#2496 ) * Replace BGRA and scale uniforms with a uniform block * Setting the data again on program change is no longer needed * Optimize and resolve some warnings * Avoid redundant support buffer updates * Some optimizations to BindBuffers (now inlined) * Unify render scale arrays	2021-08-11 21:33:43 +02:00
gdkchan	d9d18439f6	Use a new approach for shader BRX targets (#2532 ) * Use a new approach for shader BRX targets * Make shader cache actually work * Improve the shader pattern matching a bit * Extend LDC search to predecessor blocks, catches more cases * Nit * Only save the amount of constant buffer data actually used. Avoids crashes on partially mapped buffers * Ignore Rd on predicate instructions, as they do not have a Rd register (catches more cases)	2021-08-11 20:59:42 +02:00
gdkchan	ff5df5d8a1	Support non-contiguous copies on I2M and DMA engines (#2473 ) * Support non-contiguous copies on I2M and DMA engines * Vector copy should start aligned on I2M * Nits * Zero extend the offset	2021-08-04 22:20:58 +02:00
riperiperi	4b60371e64	Return mapped buffer pointer directly for flush, WriteableRegion for textures (#2494 ) * Return mapped buffer pointer directly for flush, WriteableRegion for textures A few changes here to generally improve performance, even for platforms not using the persistent buffer flush. - Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again. - As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time. - Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion. Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides. * Fix tests * Fix array pointer for Mesa/Intel path * Address some feedback * Update method for getting array pointer.	2021-07-19 19:10:54 -03:00
riperiperi	ca5ac37cd6	Flush buffers and texture data through a persistent mapped buffer. (#2481 ) * Use persistent buffers to flush texture data * Flush buffers via copy to persistent buffers. * Log error when timing out, small refactoring.	2021-07-16 18:10:20 -03:00
gdkchan	bb6fab2009	Ensure that DMA copy target textures are kept alive or flushed (#2478 )	2021-07-14 14:48:57 -03:00
gdkchan	96a070a9a7	Do not require texture and sampler pools being initialized (#2476 )	2021-07-14 14:27:22 -03:00
gdkchan	04dce402ac	Implement a fast path for I2M transfers (#2467 )	2021-07-12 16:48:57 -03:00
gdkchan	9b08abc644	Fix shader compilation on shaders that uses rectangle textures (#2471 )	2021-07-12 16:20:33 -03:00
gdkchan	40b21cc3c4	Separate GPU engines (part 2/2) (#2440 ) * 3D engine now uses DeviceState too, plus new state modification tracking * Remove old methods code * Remove GpuState and friends * Optimize DeviceState, force inline some functions * This change was not supposed to go in * Proper channel initialization * Optimize state read/write methods even more * Fix debug build * Do not dirty state if the write is redundant * The YControl register should dirty either the viewport or front face state too, to update the host origin * Avoid redundant vertex buffer updates * Move state and get rid of the Ryujinx.Graphics.Gpu.State namespace * Comments and nits * Fix rebase * PR feedback * Move changed = false to improve codegen * PR feedback * Carry RyuJIT a bit more	2021-07-11 17:20:40 -03:00
gdkchan	59900d7f00	Unscale textureSize when resolution scaling is used (#2441 ) * Unscale textureSize when resolution scaling is used * Fix textureSize on compute * Flag texture size as needing res scale values too	2021-07-09 00:09:07 -03:00
gdkchan	b02719cf41	Flush UBO updates more frequently (#2407 )	2021-07-07 21:20:52 -03:00
gdkchan	8b44eb1c98	Separate GPU engines and make state follow official docs (part 1/2) (#2422 ) * Use DeviceState for compute and i2m * Migrate 2D class, more comments * Migrate DMA copy engine * Remove now unused code * Replace GpuState by GpuAccessorState on GpuAcessor, since compute no longer has a GpuState * More comments * Add logging (disabled) * Add back i2m on 3D engine	2021-07-07 20:56:06 -03:00
gdkchan	d125fce3e8	Allow shader language and target API to be specified on the shader translator (#2402 )	2021-07-06 21:20:06 +02:00
riperiperi	94cc365b63	Honour copy dependencies when switching render target (#2433 ) * Honour copy dependencies when switching render target When switching from one render target to another, when both have a copy dependency to each other, a copy can be deferred on the second target when unbinding the first. Before, this would not be honoured before binding the new texture, so the copy would stay deferred until the render targets change again, at which point it would copy in old data and essentially clear all the draws done during that time. This change runs synchronize memory to make sure that copies are honoured. This can cause a redundant copy, but it's better than it breaking for now. This should fix miiedit on AMD/Intel GPUs on windows. May fix other games, or perhaps rare copy dependency bugs on NVIDIA too. * Address feedback	2021-07-03 01:55:04 -03:00
gdkchan	fbb4019ed5	Initial support for separate GPU address spaces (#2394 ) * Make GPU memory manager a member of GPU channel * Move physical memory instance to the memory manager, and the caches to the physical memory * PR feedback	2021-06-29 19:32:02 +02:00
gdkchan	fefd4619a5	Add support for custom line widths (#2406 )	2021-06-25 20:11:54 -03:00
gdkchan	493648df31	Fix default value for unwritten shader outputs (#2412 ) * Fix shader default output values * Shader cache version bump	2021-06-25 19:56:03 -03:00
gdkchan	ed2f5ede0f	Fix texture sampling with depth compare and LOD level or bias (#2404 ) * Fix texture sampling with depth compare and LOD level or bias * Shader cache version bump * nit: Sorting	2021-06-25 00:54:50 +02:00
gdkchan	a10b2c5ff2	Initial support for GPU channels (#2372 ) * Ground work for separate GPU channels * Rename TextureManager to TextureCache * Decouple texture bindings management from the texture cache * Rename BufferManager to BufferCache * Decouple buffer bindings management from the buffer cache * More comments and proper disposal * PR feedback * Force host state update on channel switch * Typo * PR feedback * Missing using	2021-06-24 01:51:41 +02:00
riperiperi	12a7a2ead8	Inherit buffer tracking handles rather than recreating on resize (#2330 ) This greatly speeds up games that constantly resize buffers, and removes stuttering on games that resize large buffers occasionally: - Large improvement on Super Mario 3D All-Stars (#1663 needed for best performance) - Improvement to Hyrule Warriors: AoC, and UE4 games. These games can still stutter due to texture creation/loading. - Small improvement to other games, potential 1-frame stutters avoided. `ForceSynchronizeMemory`, which was added with POWER, is no longer needed. Some tests have been added for the MultiRegionHandle.	2021-06-24 01:31:26 +02:00
gdkchan	c71ae9c85c	Fix shader texture LOD query (#2397 )	2021-06-23 23:31:14 +02:00
gdkchan	49edf14a3e	Pass all inputs when geometry shader passthrough is enabled (#2362 ) * Pass all inputs when geometry shader passthrough is enabled * Shader cache version bump	2021-06-23 23:04:59 +02:00
gdkchan	65fee49e8a	Fix separate bindless sampler at offset 0 (#2360 )	2021-06-20 20:48:12 +02:00
riperiperi	7ff1f9aa12	End shader decoding when reaching a block that starts with an infinite loop (after BRX) (#2367 ) * End shader decoding when reaching an infinite loop The NV shader compiler puts these at the end of shaders. * Update shader cache version	2021-06-15 02:09:59 +02:00

1 2 3 4 5 ...

299 commits